Python 使用jieba進行分詞並計算詞權重

import jieba
import xlrd
import jieba.analyse
defstopwordslist
(filepath)
:    stopwords =
[line.strip(
)for line in
open
(filepath,
'r', encoding=
'utf-8'
).readlines()]
return stopwords
deffenci
(content)
:    table = content.sheets()[
0]nrows = table.nrows#獲取行數
row1=
1    cell=
""    final =
""while row1cell = table.cell(row1,0)
.value
fenci=jieba.cut(cell)
for seg in fenci:
if seg not
in stopwords and
len(seg)
>0:
final+=seg+
" "                final+=
""        final+=
'\n'
# print(row1,final)
row1 +=
1return final
jieba.load_userdict(
"c:\\users\\administrator\\desktop\\userdic.txt"
)#匯入自定義詞典，自定義詞典編碼方式為utf-8
stopwords=stopwordslist(
"c:\\users\\administrator\\desktop\\stop.txt"
)#匯入停止詞典
content=xlrd.open_workbook(
"c:\\users\\administrator\\desktop\\zhaopin_data.xlsx"
)#匯入資料
final=fenci(content)
# print(final)
keywords = jieba.analyse.extract_tags(final,topk=
200,withweight=
true
,allowpos=()
)# print(keywords)
for item in keywords:
# if item[0] in ("sql","python","sas"):
print
(item[0]
, item[1]
)#可根據輸出的topk詞語，再挑選一些加入停止詞典中。

python使用jieba庫進行中文分詞

很簡單的乙個實現，當初以為很複雜。把附錄的檔案貼上就行 coding utf 8 created on tue mar 5 14 29 02 2019 author psdz jieba庫是用來分詞的庫 import jieba import jieba.analyse 是用來進行計算機系統操作的庫...

Python安裝jieba包，進行分詞

執行 cmd pip install jieba 2,實現全模式精準模式和搜尋引擎模式的分詞 encoding utf 8 import jieba 匯入自定義詞典 jieba.load userdict dict.txt 全模式 text 故宮的著名景點包括乾清宮太和殿和黃琉璃瓦等 seg l...

python的jieba簡單使用

函式含義jieba.cut string 精確模式，返回乙個可迭代的資料型別 jieba.cut string,cut all true 全模式，輸出文字string中的所有可能的單詞 jieba.cut for search string 搜尋引擎模式，適合搜尋引擎建立索引的分詞結果 jieba...

Python 使用jieba進行分詞並計算詞權重

python使用jieba庫進行中文分詞

Python安裝jieba包，進行分詞

python的jieba簡單使用

相關推薦