用python統計英文文章詞頻

import re
with
open
("text.txt"
)as f:
#讀取檔案中的字串
txt = f.read(
)#去除字串中的標點、數字等
txt = re.sub(
'[,\.()":;!@#$%^&*\d]|\'s|\'',''
, txt)
#替換換行符，大小寫轉換，拆分成單詞列表
word_list = txt.replace(
'\n'
,' '
).replace(
'  '
,' '
).lower(
).split(
' ')
word_count_dict =
for word in word_list:
#統計字典中的詞頻
if word in word_count_dict.keys():
word_count_dict[word]+=1
else
:            word_count_dict[word]=1
#按照單詞出現次數排序
word_count_dict =
sorted
(word_count_dict.items(
), key=
lambda x:x[1]
, reverse=
true
)#輸出到檔案
with
open
("word_count.txt"
,'w'
)as f1:
for i in word_count_dict:
f1.write(
"%s\t%s\n"
%(i[0]
,str
(i[1])
))

結果大概就是這樣子了：

the 8 to 6 a 6has 3 us 2 criminal 2 subpoenas 2

president 2

發現還有很多不完善的地方，在後續的學習中再改進！

英文文章的詞頻統計

今天去面試，被問到如何實現詞頻統計，因為之前都是直接呼叫value counts 函式統計，在被要求不用該函式實現統計，一緊張就卡殼了，回到家大概自己想了一下，怎麼一步步復現。實現的方法有多種，我才用的辦法是先把檔案處理成string型別，然後string處理函式讀入檔案並處理成文字 defrea...

用python實現英文詞頻統計

1.字串的內建函式的呼叫一般不改變原字串，而列表的內建函式的呼叫一般會改變列表。trystr mytry print trystr.upper mytry print trystr mytry 要想改變原字串應該使用如下 trystr trystr.upper trylist 2 6,7 1,0 p...

統計文章詞頻（python實現）

統計出文章重複詞語是進行文字分析的重要一步，從詞頻能夠概要的分析文章內容。2.建立用於詞頻計算的空字典 3.對文字的每一行計算詞頻 4.從字典中獲取資料對到列表中 5.對列表中的資料交換位置，並排序 6.輸出結果 2.網上下來的英文文章可能有一些不是utf 8編碼，並且文章中有一些字元包含一些格式符...

用python統計英文文章詞頻

英文文章的詞頻統計

用python實現英文詞頻統計

統計文章詞頻（python實現）

相關推薦