Python文字解析之字元統計與詞頻排序

一、文字字元統計

fr = open('蘭亭集序.txt','rt',encoding='utf-8')
fw = open('蘭亭集序-字元統計.txt','wt',encoding='utf-8')
txt_str = fr.read().strip().strip('\n')#不統計空格和回車'\n'
#遍歷文字字串,把字元、數量寫入字典鍵值對
#把字元、數量分別存入列表
##key_list = 
##value_list = 
txt_dict = {}
parse_ls = 
for item in txt_str:
if item not in txt_dict.keys():
txt_dict[item] = txt_str.count(item)
else:
continue
else:
print('統計完成！')
#是否有把2個列表的元素賦給字典=的方法？
#txt_dict[key_list] = value_list
#統計結果寫入csv
for key in txt_dict:
fw.write(','.join(parse_ls))
print('寫入完成！')
fr.close()
fw.close

二、文字詞頻排序

#詞頻排序——列印出前8位高頻詞
from jieba import * 
#讀入文字
f = open('蘭亭集序.txt','r',encoding=utf-8)#'utf-8'加引號
txt_str = f.read()
#除去空格、標點——strip()從兩側刪去，無法刪淨
#分詞word_list = lcut(txt_str)#可以簡寫為word_ls
#遍歷詞語——詞語:次數為字典元素的鍵值對
count_dict = {}
for word in word_list:
if word in count_dict.keys():
#有則統計值加1
count_dict[word] += 1
else:
#無則統計值為1
count_dict[word] = 1
##寫法2
##for word in word_list:
##        count_dict[word] = count_dict.get(word,0) + 1
#刪去空格、標點
for item in '\n，。！「」：':
del count_dict[item]#字典元素刪除
#詞語按出現次數前8位排序
#字典的排序
ls = 
for i in range(8):
#詞頻數
word_count = 0
#最高頻詞
max_count_word = ''
for j in count_dict:
if count_dict[j] > word_count:
word_count = d[j]
max_count_word = j
#將最高頻詞存入列表
#在字典中刪除最高頻詞，以便繼續尋找最高頻詞
del count_dict[max_count_word]
#列印輸出
print(','.join(ls))

Python 文字詞頻統計

hamlettxt gettext words hemlettxt.split counts for word in words counts word counts.get word,0 1這是一段遍歷hamlet.txt檔案的一段 s.split 函式返回的是列表list 我有一些困惑 1.最後...

Python之文字檔案解析

最近的工作主要是元件相容性測試，原有的框架有很多功能還不完善，需要補充！比如，需要將autoit指令碼的執行結果寫入到excel中，最後的解決方案是使用本地的log來解析這個結果！created on may 3,2013 author berlin class autoitresultparser...

python做統計字元 python統計字元個數

python count 方法描述python count 方法用於統計字串裡某個字元出現的次數。可選引數為在字串搜尋的開始與結束位置。語法count 方法語法 str.count sub,start 0,end len string 複製引數sub 搜尋的子字串 start 字串開始搜尋的位置...

Python文字解析之字元統計與詞頻排序

Python 文字詞頻統計

Python之文字檔案解析

python做統計字元 python統計字元個數

相關推薦