python使用jieba實現簡單的詞頻統計

import
jieba
defgettext():
txt=open("
hamlet.txt
","r
").read()
txt=txt.lower()
for ch in
'|"#$%&()*+,-./:;<>+?@[\\]^_~':
txt=txt.replace(ch,"")
return
txtharmtxt=gettext()
words=harmtxt.split()
counts={}
for word in
words:
counts[word]=counts.get(word,0)+1items=list(counts.items())
#按照第二個元素有大到小排序
items.sort(key=lambda  x:x[1],reverse=true)
for i in range(10):
word, count=items[i]
print(word,end=":"
)    
print(count)

執行結果

the:1138

and:965

to:754

of:668

you:549

a:542

i:540

my:514

hamlet:456

in:436

import
jieba
txt=open("
threekingdoms.txt
","r
",encoding="
utf-8
").read()
#總結一些不是人名的詞
excludes=
words=jieba.lcut(txt)
counts={}
for word in
words:
if len(word)==1:
continue
elif word=="
諸葛亮"
or word=="
孔明曰"
:        rword="孔明"
elif word=="關公"
or word=="雲長"
:        rword="關羽"
elif word=="玄德"
or word=="
玄德曰"
:        rword="劉備"
elif word=="孟德"
or word=="丞相"
:        rword="曹操"
else
:        rword=word
counts[rword]=counts.get(rword,0)+1
for word in
excludes:
delcounts[word]
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=true)
for i in range(10):
word,count=items[i]
print(word,end=":"
)    
print(count)

執行結果：

曹操:1451

孔明:1383

劉備:1252

關羽:784

張飛:358

軍士:317

呂布:300

軍馬:293

趙雲:278

次日:271

python的jieba簡單使用

函式含義jieba.cut string 精確模式，返回乙個可迭代的資料型別 jieba.cut string,cut all true 全模式，輸出文字string中的所有可能的單詞 jieba.cut for search string 搜尋引擎模式，適合搜尋引擎建立索引的分詞結果 jieba...

python中的jieba簡單使用

jieba常用三個函式 jieba.lcut x jieba.lcut x,cut all true jieba.lcut for rearch x 練習 import jieba s 中國特色社會主義進入新時代，我國社會主要矛盾已經轉化為人民日益增長的美好生活需要和不平衡不從分的發展之間的矛盾。...

python中jieba庫的使用

英語中我們可以通過.split 對字串進行分割，從而獲取到單詞的列表。比如如下對哈姆雷特中前10英文單詞頻率進行了統計排序 calhamletv1.py def gettext txt open word frequency hamlet.txt r read txt txt.lower for ...

python使用jieba實現簡單的詞頻統計

python的jieba簡單使用

python中的jieba簡單使用

python中jieba庫的使用

相關推薦