平時我們在工作的時候需要統計一篇文章或者網頁出現頻率最高的單詞,或者需要統計單詞出現頻率排序。那麼如何完成這個任務了?
例如,我們輸入的語句是 「hello there this is a test. hello there this was a test, but now it is not.」,希望得到的公升序的結果:
[[1, 'but'], [1, 'it'], [1, 'not.'], [1, 'now'], [1, 'test,'], [1, 'test.'], [1, 'was'], [2, 'hello'], [2, 'a'], [2, 'is'], [2, 'there'], [2, 'this']]
得到降序的結果是:
[[2, 'this'], [2, 'there'], [2, 'is'], [2, 'a'], [2, 'hello'], [1, 'was'], [1, 'test.'], [1, 'test,'], [1, 'now'], [1, 'not.'], [1, 'it'], [1, 'but']]
完成這個結果的**如下:
class counter(object):
def __init__(self):
self.dict = {}
def add(self, item):
count = self.dict.setdefault(item, 0)
self.dict[item] = count + 1
def counts(self, desc=none):
result = [[val, key] for (key, val) in self.dict.items()]
result.sort()
if desc:
result.reverse()
return result
if __name__ == '__main__':
'''produces:
>>> ascending count:
[[1, 'but'], [1, 'it'], [1, 'not.'], [1, 'now'], [1, 'test,'], [1, 'test.'], [1, 'was'], [2, 'hello'], [2, 'a'], [2, 'is'], [2, 'there'], [2, 'this']]
descending count:
[[2, 'this'], [2, 'there'], [2, 'is'], [2, 'a'], [2, 'hello'], [1, 'was'], [1, 'test.'], [1, 'test,'], [1, 'now'], [1, 'not.'], [1, 'it'], [1, 'but']]
'''sentence = "hello there this is a test. hello there this was a test, but now it is not."
words = sentence.split()
c = counter()
for word in words:
c.add(word)
print "ascending count:"
print c.counts()
print "descending count:"
print c.counts(1)
統計單詞出現頻率
這裡有乙個大文字,檔案請從 獲取,在解壓後大約有20m 實際比賽時檔案是1.1g 文字中都是英文單詞,空格以及英文的標點符號 句號,逗號,分號,破折號,波浪號,雙引號,問號,單引號,感嘆號 請統計出該文字中最常出現的前10個單詞 不區分大小寫 請注意,在統計中這20個單詞請忽略 the,and,i,...
計算單詞出現頻率
cat words.txt tr cs a z a z 012 tr a z a z sort uniq c sort k1nr k2 head 10 但是有時我們想查詢出某乙個單詞的出現頻率這時我們可以使用如下幾個命令 檔名稱 file 查詢單詞名稱 word 操作命令 1 more file g...
統計元素出現頻率
from collections import counter import random data random.randint 0,20 for in range 20 print 20個0 20之間的隨機數 data d dict.fromkeys data,0 以data 現的數字為鍵,0為...