第8次作業

wordcount程式任務：

wordcount

輸入乙個包含大量單詞的文字檔案

輸出檔案中每個單詞及其出現次數（頻數），

並按照單詞字母順序排序，

每個單詞和其頻數佔一行，單詞和頻數之間有間隔

import cprofile
import pstats
def process_file(dst):
try:
f = open(dst, "r")  # 開啟檔案
except ioerror as s:
print(s)
return none
try:
bvffer = f.read()  # 讀檔案到緩衝區
except:
print('read file error!')
return none
f.close()
return bvffer
def process_buffer(bvffer):
if bvffer:
word_freq = {}
# 下面新增處理緩衝區bvffer**，統計每個單詞的頻率，存放在字典word_freq
bvffer = bvffer.lower()
# 去除文字中的中英文標點符號
for ch in '「『!;,.?」':
bvffer = bvffer.replace(ch, " ")
words = bvffer.strip().split()
for word in words:
word_freq[word] = word_freq.get(word, 0) + 1  # 給單詞計數

授予可執行許可權

編寫reducer.py

#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = none
current_count = 0
word = none
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except valueerror: 
continue
if current_word == word:
current_count += count
else:
if current_word:
print "%s\t%s" % (current_word, current_count)
current_count = count
current_word = word
if word == current_word: 
print "%s\t%s" % (current_word, current_count)

授予可執行許可權

$ chmod +x reducer.py

第8次作業

一 hive用本地檔案進行詞頻統計 1.準備本地txt檔案 2.啟動hadoop，啟動hive 3.建立資料庫，建立文字表 4.對映本地檔案的資料到文字表中 5.hql語句進行詞頻統計交將結果儲存到結果表中。6.檢視統計結果二 hive用hdfs上的檔案進行詞頻統計 1.準備電子書或其它大的文字檔...

第8次作業

wordcount程式任務程式wordcount 輸入乙個包含大量單詞的文字檔案輸出檔案中每個單詞及其出現次數頻數並按照單詞字母順序排序，每個單詞和其頻數佔一行，單詞和頻數之間有間隔 1.用你最熟悉的程式設計環境，編寫非分布式的詞頻統計程式。2.用mapreduce實現詞頻統計在ubunt...

第8次作業

猜數字隨機產生乙個0 99的數猜猜看如果大了就提示大了點如果小了就提示小了點直到猜對為止 include include include main int x,guess srand unsigned int time null guess rand 100 while 1 print...

第8次作業

第8次作業

第8次作業

第8次作業

相關推薦