2 安裝Spark與Python練習

一、安裝spark

檢查基礎環境hadoop,jdk

配置檔案

環境變數

試執行python**

二、python程式設計練習：英文文字的詞頻統計

準備文字檔案

讀檔案預處理：大小寫，標點符號，停用詞

分詞統計每個單詞出現的次數

按詞頻大小排序

結果寫檔案

with open("
test.txt
", "r"
) as f:
text=f.read()
text =text.lower()
for ch in
'!@#$%^&*(_)-+=\\}
for word in
words2:
counts[word] = counts.get(word,0) + 1items =list(counts.items())
items.sort(key=lambda x:x[1],reverse=true)
#輸出前10
for i in range(10):
word, count =items[i]
print("
".format(word, count)) #
列印前十個元素

2 安裝Spark與Python練習

檢查基礎環境hadoop，jdk 配置檔案環境變數啟動spark 試執行python 準備文字檔案 txt 讀檔案txt open bumi.txt r encoding utf 8 read 預處理大小寫，標點符號，停用詞將大寫字母變成小寫字母 txt txt.lower 去除標點符號及停...

2 安裝Spark與Python練習

讀檔案 text open work1.txt r encoding utf 8 read 載入停用詞表 stopwords line.strip for line in open stopword.txt encoding utf 8 readlines list型別分詞未去停用詞 text s...

2 安裝Spark與Python練習

一安裝spark 檢查基礎環境hadoop,jdk 配置檔案環境變數配置環境修改環境變數 vim bashrc 生效 source bashrc 試執行python 二 python程式設計練習英文文字的詞頻統計準備文字檔案統計每個單詞出現的次數結果寫檔案三根據自己的程式設計習慣...

2 安裝Spark與Python練習

2 安裝Spark與Python練習

2 安裝Spark與Python練習

2 安裝Spark與Python練習

相關推薦