統計文章裡的詞頻並降序輸出

資料
onelife.txt
import re
from string import punctuation
# 讀取檔案
with
open
('d:'
, encoding=
'utf-8'
)as f1:
contents = f1.readlines(
)# 遍歷每行的單詞
for content in contents:
# 將字母轉成小寫
content = content.lower(
)# 過濾標點符號
content = re.sub(
'[{}]'
.format
(punctuation +
'《》'),
' ', content)
# 定義乙個空的字典用來統計詞頻
wordconut =
# 將每行單詞轉成列表
words = content.split(
)for word in words:
# 判斷單詞是否在字典中 存在加1
if word in wordconut:
wordconut[word]+=1
else
:            wordconut[word]=1
# 將字典轉成列表
wordconut = wordconut.items(
)# 對列表進行排序
items =
sorted
(wordconut, key=
lambda x: x[1]
)# 按詞頻降序
# 引入有模板首航跳過
from itertools import islice
# 定義乙個空字典
direct =
# 讀檔案
with
open
('d://countries_zh.csv'
, encoding=
'utf-8'
)as  f1:
# 首行跳過
for line in islice(f1,1,
none):
# 將每行截成字元陣列
item = line.split(
',')
# 將字串轉成整形
item[4]
=int
(item[4]
.split(
'\n')[
0])# 將每行的單詞以key:value寫入字典中
direct[item[0]
+','
+ item[1]
+','
+ item[2]
+','
+ item[3]
]= item[4]
# 將字典轉成列表
direct = direct.items(
)# 對列表排序
list
=sorted
(direct, key=
lambda x: x[1]
)# 最後在對列表降序
				英文文章的詞頻統計
今天去面試，被問到如何實現詞頻統計，因為之前都是直接呼叫value counts 函式統計，在被要求不用該函式實現統計，一緊張就卡殼了，回到家大概自己想了一下，怎麼一步步復現。實現的方法有多種，我才用的辦法是先把檔案處理成string型別，然後string處理函式 讀入檔案並處理成文字 defrea...
				統計檔案中的單詞個數並輸出（C語言）
分析 用單鏈表儲存單詞和單詞的個數，從檔案中讀出乙個單詞，判斷單詞是否是第一次出現，如果是第一次出現就建立結點插入鍊錶後，否則該單詞數 1。include include include include typedef struct node node,link node firstword nod...
				統計字串中重複的字元個數並輸出
輸出字串各個字元的個數 對重複的字元將其下標存放在vector中，使用unique函式只儲存乙份重複字元的數字 通過下標查詢到相應的字元，從map中取出對應的統計數字 using namespace std int main sort coll.begin coll.end pos unique c...
統計文章裡的詞頻並降序輸出

英文文章的詞頻統計

統計檔案中的單詞個數並輸出（C語言）

統計字串中重複的字元個數並輸出

相關推薦