利用jieba分詞分析小說一

**人物名字的txt檔案。

中文停用詞txt檔案。

安裝好jieba庫。

用jieba.cut()完成分詞後統計各人物的出場次數.

import jieba
import pickle
import jieba.analyse
names =
all_names =
sentence =
#用來儲存分詞結果
text_path =
'/users/xh/desktop/bishe/longzu.txt'
jieba.load_userdict(
'/users/xh/desktop/bishe/name.txt'
)jieba.analyse.set_stop_words(
'/users/xh/desktop/bishe/stopwords.txt'
)

使用load_userdict()新增自定義詞典，這裡把人物名字加進去，再載入停用詞，這樣能使分詞結果更加準確。

f =
open
(text_path,
'r')
f1 =
open
('/users/xh/desktop/bishe/name.txt'
,'r'
)#讀取各人物的名字
for line in f1.readlines():
))#進行分詞
for line in f.readlines():
#逐行讀取文字
seg_list = jieba.cut(line,cut_all=
false
)    unique_list =
)for i in seg_list:
if i not
in stopword:
if i in all_names:
if names.get(i)
isnone
:            names[i]+=1
print
(names)

jieba.cut返回的是乙個可迭代物件，用for迴圈遍歷即可，經過這步操作，names裡面就統計出了各人物出場的次數。

看看執行結果：

/users/xh/desktop/lianxi/venv/ bin/python /users/xh/desktop/bishe/sada.py building prefix dict from the default dictionary ... loading model from cache /var/folders/8n/5s94235n4_jgw7tzm316c4n80000gp/t/jieba.cache loading model cost 0.971 seconds. prefix dict has been built succesfully.

從這裡可以很明顯看出路明非是主角，出場次數遠高於其他人，女一是諾諾，這與這本書的設定也相符合，說明分詞結統計結果基本沒問題。

text = f.read(
)tags = jieba.analyse.extract_tags(text, topk=
20, withweight=
true
)print()
for k, v in tags:
print(.
format
(k, v)
)f.close(
)

執行結果：

在這裡可以看出主角路明非的權重很高。

jieba 利用jieba分詞

目錄三種分詞模式新增自定義詞典進行分詞 jieba提供了三種分詞模式，分別是全模式，精確模式和搜尋引擎模式。全模式下會將所有可能的詞語都進行分詞，精確模式下會盡可能的將句子精確切開，搜尋引擎模式實在精確模式的基礎上，對長詞再進行劃分，提高分詞的召回率。使用cut和cut for search即可...

jieba分詞原始碼分析

jieba是乙個開源的中文分詞庫。posseg 自定義詞典 init.py jieba分詞的入口 compat.py dict.txt 總的詞庫，記錄乙個詞詞頻和詞性 test 測試demo encoding utf 8 import jieba seg list jieba.cut 我來到北京清...

使用jieba分析小說人物出現次數

分析 1.讀取以讀的形式開啟 with open 檔名.txt r encoding utf8 as f str f.read 2.切割 ret jieba.lcut str 3.統計所有詞語出現次數準備乙個字典 dic for word in ret if len word 1 去掉名字為乙個...

利用jieba分詞分析小說一

jieba 利用jieba分詞

jieba分詞原始碼分析

使用jieba分析小說人物出現次數

相關推薦