Python 文字詞頻統計中英文

統計一段英文中出現次數最多的幾個單詞

def
get_text()
:    text =
open
('eng.txt'
,'r'
).read(
)    text = text.lower(
)#所有單詞都替換成小寫
for ch in
'!@#$%^&*()_+-{}|\<>?/.,`~'
:#去噪，歸一化處理，把所有特殊符號替換為空格
text=text.replace(ch,"")
return text
text=get_text(
)ls = text.split(
" ")
counts=
for index,word in
enumerate
(ls)
:    counts[word]
=counts.get(word,0)
+1items=
list
(counts.items())
#字典轉換為列表
items.sort(key=
lambda x:x[1]
,reverse=
true
)#把列表按照鍵值對的第二個元素，由大到小排序
for i in
range(5
):#列印出現頻率最高的5個詞語
word,fre = items[i]
print
(word,fre)

統計三國**現次數最多的人名

import jieba
defget_text()
:    text =
open
('threeking.txt'
,'r'
, encoding=
"gb18030"
).read(
)return text
text = get_text(
)ls = jieba.lcut(text)
#排除不需要統計的字元或詞語
exclude=
['將軍'
,'不可'
,'二人'
,'荊州'
,'卻說'
,'不能'
,'如此'
,'商議'
,'如何'
,'主公'
,'軍士'
,'左右'
,'軍馬'
,'引兵'
,'次日'
,'大喜'
,'天下'
,'東吳'
,'丞相'
,'於是'
,'今日'
,'不敢'
,'陛下'
,'魏兵'
]counts =
#用來計數的集合
for word in ls:
iflen
(word)==1
:#排除掉長度為1的字或標點符號
continue
elif word in exclude:
continue
#對以下詞語進行合併計算
elif word ==
'玄德'
or word ==
'玄德曰'
:        rword =
'劉備'
elif word ==
'諸葛亮'
or word ==
'孔明曰'
:        rword =
'孔明'
elif word ==
'關公'
or word ==
'雲長'
:        rword =
'關羽'
elif word ==
'都督'
:        rword =
'周瑜'
elif word ==
'孟德'
:        rword =
'曹操'
else
:        rword = word
counts[rword]
= counts.get(rword,0)
+1items =
list
(counts.items())
#字典轉換為列表
items.sort(key=
lambda x: x[1]
, reverse=
true
)#把列表按照鍵值對的第二個元素，由大到小排序
for i in
range(10
):#列印出現頻率最高的詞語
word, fre = items[i]
print
(word, fre)

python 詞頻統計中英文

calhamletv1.py 英文統計程式 def gettext txt open hamlet.txt r read txt txt.lower for ch in txt txt.replace ch,將文字中特殊字元替換為空格 return txt hamlettxt gettext wor...

python中的中英文本元統計

英語字元和中文字元的區別在於大小寫字元和字元個數中文中是乙個詞語統計英語字元 def gettext txt open halmet.txt r read txt txt.lower 文中所有英語小寫 for ch in txt txt.replace ch,return txt halmet...

python 中英文分離中英文分離

由於沒有安裝 numpy 根據部落格提示，成功安裝了numpy 執行之後沒有錯誤，可是嘛，我看不到結果。也就隨它去了。主要有兩個問題，乙個是執行的時候出現的 valueerror need more than 0 values to unpack 對於空行就會報錯。不機智。於是加了個判斷。讓它一直走...

Python 文字詞頻統計中英文

python 詞頻統計 中英文

python中的中英文本元統計

python 中英文 分離 中英文分離

相關推薦

python 詞頻統計中英文

python 中英文分離中英文分離