python jieba分詞對比字串相似度

那麼說幹就幹，這裡在網上進行查詢，jieba分詞十分符合我們的需求，那麼看了一些例子之後開始寫demo。

**如下

import jieba
import numpy as np
import pymysql
class similarity():
def __init__(self):
self.db = pymysql.connect(host='localhost', port=3306, user='root',
passwd='123456', db='dazhong', charset='utf8mb4')
self.cursor = self.db.cursor()
def get_word_vector(self, word1, word2):
cut1 = jieba.cut(word1)
cut2 = jieba.cut(word2)
list_word1 = (','.join(cut1)).split(',')
list_word2 = (','.join(cut2)).split(',')
# 列出所有的詞,取並集
key_word = list(set(list_word1 + list_word2))
# 給定形狀和型別的用0填充的矩陣儲存向量
word_vector1 = np.zeros(len(key_word))
word_vector2 = np.zeros(len(key_word))
# 計算詞頻
# 依次確定向量的每個位置的值
for i in range(len(key_word)):
# 遍歷key_word中每個詞在句子中的出現次數
for j in range(len(list_word1)):
if key_word[i] == list_word1[j]:
word_vector1[i] += 1
for k in range(len(list_word2)):
if key_word[i] == list_word2[k]:
word_vector2[i] += 1
# # 輸出向量
# print(word_vector1)
# print(word_vector2)
result = self.cos_dist(word_vector1, word_vector2)
return result
def cos_dist(self, vec1, vec2):
""":param vec1: 向量1
:param vec2: 向量2
:return: 返回兩個向量的余弦相似度
"""dist1 = float(np.dot(vec1,vec2)/(np.linalg.norm(vec1)*np.linalg.norm(vec2)))
return dist1
def contrast(self):
sql = """select hbh_store_name,hlj_store_name,id from error"""
self.cursor.execute(sql)
ress = self.cursor.fetchall()
for res in ress:
word1 = res[0]
word2 = res[1]
id = res[2]
num = self.get_word_vector(word1, word2)
sql = """update error set word_similar = '{}' where id = {}""".format(num, id)
print(sql)
self.cursor.execute(sql)
self.db.commit()
if __name__ == '__main__':
sl = similarity()
sl.contrast()

對比返回乙個結果值，結果值越大表示越相似，如果越接近0則表示不相似

Python jieba分詞常用方法

支援3中分詞模式 1.全模式把句子中的所有可以成詞的詞語都掃瞄出來，使用方法 jieba.cut 字串 cut all true,hmm false 2.精確模式試圖將文字最精確的分開，適合於做文字分析。使用方法 jieba.cut 字串 cut all false,hmm true 3.搜尋引...

python jieba分詞詞性標註

進行詞性標註檔案讀取寫入做實驗室的乙個專案，暫時要做的內容對文字資料作摘要 8 首先觀察文字資料，我們需要擷取符號 open cut.txt r encoding utf 8 f1 open cut result.txt w encoding utf 8 for line in f.readl...

python jieba分詞庫的使用

測試環境 py3 win10 import jieba str test 有很多人擔心，美國一聲令下，會禁止所有的開源軟體被中國使用,這樣的擔憂是不必要的。返回迭代器 c1 jieba.cut str test c2 jieba.cut str test,cut all true c3 jieba....

python jieba分詞 對比字串相似度

Python jieba分詞常用方法

python jieba分詞 詞性標註

python jieba分詞庫的使用

相關推薦

python jieba分詞對比字串相似度

python jieba分詞詞性標註