長城評論詞云分析

2021-07-25 14:46:40 字數 1099 閱讀 5834

最近《長城》這部電影很火,爭議也很大,我根據豆瓣底部的點評來分析這部電影在觀眾眼中是怎樣的。

此為長城短評鏈結。

下面給出**:

library(xml)

library(rcurl)

library(stringr)

library(rwordseg)

library(tm)

library(wordcloud2)

library(wordcloud)

url<- ""

html_form

<- readlines(url,encoding="utf-8")

a<- c(26,52,80,114,136,161,187)

for(i in a)

content

<- html_form[str_detect(html_form,'class="">')]

contents

<- str_sub(content, start=21)

segment

<- segmentcn(contents)

segment_unlist

<- unlist(segment)

corpus

<- corpus(vectorsource(segment_unlist))

dm_mat

<- documenttermmatrix(corpus)

dmmat

<- as.matrix(dm_mat)

dmmat_colsum

<- colsums(dmmat)

df<- data.frame(name=names(dmmat_colsum),

freq=as.numeric(dmmat_colsum),stringsasfactors = f)

wordcloud2(df,shape="r")

df[which.max(df$freq),]

wordcloud(df$name,df$freq,min.freq = 2, random.color = t,

colors=rainbow(24),scale = c(4,1))

電影《戰狼》評論詞云分析

匯入需要使用的第三方庫 3.統計電影的推薦情況 perfect counts comments data 推薦 力薦 value counts perfect counts good counts comments data 推薦 推薦 value counts good counts not ba...

評論內容 詞云

coding utf 8 from wordcloud import wordcloud import jieba import matplotlib.pyplot as plt from matplotlib.font manager import fontpropertiesfrom scipy...

詞云分析wordcloud

jieba模組 用來切割中文的模組 pillow python3中用來專門處理影象的模組 import re import jieba from pil import image from wordcloud import wordcloud import numpy as np def gen w...