思路:讀取源文件,形成分詞列表,然後讀取停用詞,將不必要的詞語進行去除,然後統計詞頻,詞云引數設定,繪製詞雲圖。
"""匯入相關庫"""
import jieba
import pandas as pd
from imageio import imread
from wordcloud import wordcloud
from matplotlib import pyplot as plt
"""讀取原始檔,然後形成分詞列表"""
with open('e:/yuanwenjian.txt','r',encoding='utf-8') as f:
txt = f.read()
txt = txt.split()
data_cut = [jieba.lcut(x) for x in txt] #分詞後結果,形式為二維列表(裡面是列表)
all_words = #轉化成一維列表(裡面是字串)
for i in data_cut:
all_words.extend(i)
# all_words.count('詞語') #統計詞頻
"""讀取停用詞文件"""
with open("e:\\stopwords.txt",'r',encoding='utf-8') as f:
stop=f.read()
stop = stop.split()
stop = [' ']+stop
data_after = [[j for j in i if j not in stop] for i in data_cut] #判斷是否為停用詞
"""統計詞頻"""
all_words =
for i in data_after:
all_words.extend(i)
num = pd.series(all_words).value_counts()
"""讀取背景"""
pic = imread('e:/素材/logo/python.png')
"""詞云引數"""
wc = wordcloud(background_color = 'white',font_path='c:\\windows\\fonts\\simkai.ttf',mask=pic)
'''wc = wordcloud(background_color = 'white',
font_path='c:\\windows\\fonts\\simkai.ttf',
max_words=200,
max_font_size=10,
mask=pic)
'''wc2 = wc.fit_words(num) #詞頻傳入
"""詞云展示"""
plt.figure(figsize=(9,9)) #的大小
plt.imshow(wc2)
plt.axis('off') #關閉座標
plt.show()
wc.to_file("ciyun.png") #儲存
文字識別的起始應用與展示 詞云
from wordcloud import wordcloud 詞云本雲 import matplotlib.pyplot as plt 作圖利器 import jieba import pandas as pd import matplotlib.image as mpimg import num...
python電影名稱詞云 python 詞云
1 寫詞云的思路 資料採集 分詞 生成詞云 2 用到的模組 wordcloud。如果沒有這個模組,cmd進入python所在目錄的scripts資料夾,通過pip安裝。pip install wordcloud。安裝出現以下錯誤 3 開始編碼 匯入模組 from wordcloud import w...
文件詞袋模型
詞袋模型記錄了單詞在詞彙表中出現的次數。def loaddataset 建立文件集合以及標籤 postinglist my dog has flea problems help please maybe not take him to dog park stupid my dalmation is ...