統計英文兒歌《twinkle twinkle little star》中,使用到的單詞及其出現次數。要求去除單詞大小寫的影響,不統計標點符號的個數,並按降序輸出。
twinkle, twinkle, little star,
how i wonder what you are!
up above the world so high,
like a diamond in the sky.
twinkle, twinkle, little star,
how i wonder what you are!
when the blazing sun is gone,
when he nothing shines upon,
then you show your little light,
twinkle, twinkle, all the night.
twinkle, twinkle, little star,
how i wonder what you are!
題幹說去除大小寫,不統計標點,所以第一部先把大寫轉小寫並且把標點符號轉換成空格;然後用split函式將字串切片,返回乙個字串列表;接著統計列表中各元素出現次數,整合到乙個字典裡;最後用sorted對字典的值進行排序。
message = "twinkle, twinkle, little star,how i wonder what you are!" \
"up above the world so high,like a diamond in the sky." \
"twinkle, twinkle, little star,how i wonder what you are!" \
"when the blazing sun is gone,when he nothing shines upon," \
"then you show your little light,twinkle, twinkle, all the night." \
"twinkle, twinkle, little star,how i wonder what you are!"
message = message.lower().replace(',', ' ').replace('.', ' ').replace('!', ' ')
list_message = message.split()
count = {}
for i in list_message:
if i not in count:
count[i] = 1
else:
count[i] += 1
print(sorted(count.items(), key=lambda item: item[1], reverse=true))
執行結果:
[('twinkle', 8), ('little', 4), ('you', 4), ('the', 4), ('star', 3), ('how', 3), ('i', 3), ('wonder', 3), ('what', 3), ('are', 3), ('when', 2), ('up', 1), ('above', 1), ('world', 1), ('so', 1), ('high', 1), ('like', 1), ('a', 1), ('diamond', 1), ('in', 1), ('sky', 1), ('blazing', 1), ('sun', 1), ('is', 1), ('gone', 1), ('he', 1), ('nothing', 1), ('shines', 1), ('upon', 1), ('then', 1), ('show', 1), ('your', 1), ('light', 1), ('all', 1), ('night', 1)]
統計文章單詞出現次數
英文文章中的標點符號的處理,單詞大小寫的處理,再將單詞通過字典的統計出現次數,最後用sorted 排序 利用maketrans函式將標點符號對映為空格 table str.maketrans 開啟需要統計的檔案 f open r c python 1.txt file1 f.read f.close...
python統計文章單詞次數
題目是這樣的 你有乙個目錄,放了你乙個月的日記,都是 txt,為了避免分詞的問題,假設內容都是英文,請統計出你認為每篇日記最重要的詞。其實就是統計一篇文章出現最多的單詞,但是要去除那些常見的連詞 介詞和謂語動詞等,coding utf 8 import collections import re i...
linux 統計檔案中單詞出現次數
請教 統計檔案a.txt中 每個單詞 的重複出現次數?若該檔案大到幾個g又該如何處理?方案一 bin sh 定義原始檔和臨時檔案 srcfile word.txt tempfile words tempfile words tempfile words uniq tempfile words uni...