英文詞頻統計

2022-05-12 06:28:44 字數 4061 閱讀 7705

詞頻統計預處理

將所有,.?!』:等分隔符全部替換為空格

將所有大寫轉換為小寫

生成單詞列表

生成詞頻統計

排序排除語法型詞彙,代詞、冠詞、連詞

輸出詞頻最大top10

word = '''

lately, i've been, i've been losing sleep

dreaming about the things that we could be

but baby, i've been, i've been praying hard,

said, no more counting dollars

we'll be counting stars, yeah we'll be counting stars

i see this life like a swinging vine

swing my heart across the line

and my face is flashing signs

seek it out and you shall find

old, but i'm not that old

young, but i'm not that bold

i don't think the world is sold

i'm just doing what we're told

i feel something so right

doing the wrong thing

i feel something so wrong

doing the right thing

i could lie, coudn't i, could lie

everything that kills me makes me feel alive

lately, i've been, i've been losing sleep

dreaming about the things that we could be

but baby, i've been, i've been praying hard,

said, no more counting dollars

we'll be counting stars

'''#

標點替換為空格

symbol = ["

,", "

.", "

!", "

?", "

'", "

:", "-"

]#無意義的單詞

words = ['

t','

ve','

ll','m'

]new_art =word

for i in

range(len(symbol)):

new_art = new_art.replace(symbol[i],'

') #

把文章的標點符號替換

new_art = new_art.lower() #

改成小寫

art_list = new_art.split() #

以空格將字串分成單詞列表

dic =dict(zip())

for i in

art_list:

dic[i] = new_art.count(i) #

用字典記錄單詞和其出現次數

for i in

words:

if(dic.get(i)!=none): #

如果為冠詞之類的無意義的詞,將其捨棄

dic.pop(i)

new_dic = sorted(dic.items(),key=lambda x:x[1],reverse =true)

for i in range(10):

print(new_dic[i]) #

取出現頻率最高的10個單詞

詞頻統計預處理

將所有,.?!』:等分隔符全部替換為空格

將所有大寫轉換為小寫

生成單詞列表

生成詞頻統計

排序排除語法型詞彙,代詞、冠詞、連詞

輸出詞頻最大top10

word = '''

lately, i've been, i've been losing sleep

dreaming about the things that we could be

but baby, i've been, i've been praying hard,

said, no more counting dollars

we'll be counting stars, yeah we'll be counting stars

i see this life like a swinging vine

swing my heart across the line

and my face is flashing signs

seek it out and you shall find

old, but i'm not that old

young, but i'm not that bold

i don't think the world is sold

i'm just doing what we're told

i feel something so right

doing the wrong thing

i feel something so wrong

doing the right thing

i could lie, coudn't i, could lie

everything that kills me makes me feel alive

lately, i've been, i've been losing sleep

dreaming about the things that we could be

but baby, i've been, i've been praying hard,

said, no more counting dollars

we'll be counting stars

'''#

標點替換為空格

symbol = ["

,", "

.", "

!", "

?", "

'", "

:", "-"

]#無意義的單詞

words = ['

t','

ve','

ll','m'

]new_art =word

for i in

range(len(symbol)):

new_art = new_art.replace(symbol[i],'

') #

把文章的標點符號替換

new_art = new_art.lower() #

改成小寫

art_list = new_art.split() #

以空格將字串分成單詞列表

dic =dict(zip())

for i in

art_list:

dic[i] = new_art.count(i) #

用字典記錄單詞和其出現次數

for i in

words:

if(dic.get(i)!=none): #

如果為冠詞之類的無意義的詞,將其捨棄

dic.pop(i)

new_dic = sorted(dic.items(),key=lambda x:x[1],reverse =true)

for i in range(10):

print(new_dic[i]) #

取出現頻率最高的10個單詞

使用Python進行英文詞頻統計

1.讀取檔案,通過lower replace 函式將所有單詞統一為小寫,並用空格替換特殊字元。def gettext txt open piao.txt r errors ignore read txt txt.lower for ch in txt txt.replace ch,return tx...

用python實現英文詞頻統計

1.字串的內建函式的呼叫一般不改變原字串,而列表的內建函式的呼叫一般會改變列表。trystr mytry print trystr.upper mytry print trystr mytry 要想改變原字串應該使用如下 trystr trystr.upper trylist 2 6,7 1,0 p...

MATLAB小應用 中文 英文詞頻統計

英文詞頻統計很簡單,只需借助split斷句,再統計即可。完整matlab function wordcount 思路 中文詞頻統計涉及到對 詞語 的判斷,需要匯入詞典或編寫判斷規則,很複雜。最簡單的辦法是直接統計英文詞頻,並由空格直接劃分詞語。然後再翻譯即可得到中文詞頻。clc clear repo...