python統計文章單詞次數

題目是這樣的：你有乙個目錄，放了你乙個月的日記，都是 txt，為了避免分詞的問題，假設內容都是英文，請統計出你認為每篇日記最重要的詞。

其實就是統計一篇文章出現最多的單詞，但是要去除那些常見的連詞、介詞和謂語動詞等，**：

#coding=utf-8
import collections
import re
import os
useless_words=('the','a','an','and','by','of','in','on','is','to')
defget_important_word
(file):
f=open(file)
word_counter=collections.counter()
for line in f:
words=re.findall('\w+',line.lower())
word_counter.update(words)
f.close()
most_important_word=word_counter.most_common(1)[0][0]
count=2
while(most_important_word in useless_words):
most_important_word=word_counter.most_common(count)[count-1][0]
count+=1
num=word_counter.most_common(count)[count-1][1]
print
if __name__=='__main__':
filepath='.'
for dirpath,dirname,dirfiles in os.walk(filepath):
for file in dirfiles:
if os.path.splitext(file)[1]=='.txt':
abspath=os.path.join(dirpath,file)
if os.path.isfile(abspath):
get_important_word(abspath)

學習筆記：

1、collections模組，是python內建的模組，提供了許多有用的集合類。我們這裡用到了counter類和其中的most_common()方法

統計文章單詞出現次數

英文文章中的標點符號的處理，單詞大小寫的處理，再將單詞通過字典的統計出現次數，最後用sorted 排序利用maketrans函式將標點符號對映為空格 table str.maketrans 開啟需要統計的檔案 f open r c python 1.txt file1 f.read f.close...

python統計單詞出現次數

統計英文兒歌 twinkle twinkle little star 中，使用到的單詞及其出現次數。要求去除單詞大小寫的影響，不統計標點符號的個數，並按降序輸出。twinkle,twinkle,little star,how i wonder what you are up above the wo...

python 統計文章單詞個數

def gettext txt open article.txt r read txt txt.lower for ch in txt txt.replace ch,return txt hamlettxt gettext words hamlettxt.split counts forword i...

python統計文章單詞次數

統計文章單詞出現次數

python統計單詞出現次數

python 統計文章單詞個數

相關推薦