理解MapReduce計算構架

用python編寫wordcount程式任務

程式wordcount

輸入乙個包含大量單詞的文字檔案

輸出檔案中每個單詞及其出現次數（頻數），並按照單詞字母順序排序，每個單詞和其頻數佔一行，單詞和頻數之間有間隔

1.編寫map函式，reduce函式

cd /home/hadoop

mkdir wc

cd /home/hadoop/wc

touch reducer.py

import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print '%s\t%s' % (word,1)

reduces.py:

from operator import itemgetter
import sys
current_word = none
current_count = 0
word=none
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count=int(count)
except valueerror:
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word,  current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word,  current_count)

2.將其許可權作出相應修改

chmod a+x /home/hadoop/wc/reducer.py

3.本機上測試執行**

4.放到hdfs上執行

6.用hadoop streaming命令提交任務:

cd /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar

開啟環境變數配置檔案:

gedit ~/.bashrc

在裡面寫入streaming路徑:

export stream=$hadoop_home/share/hadoop/tools/lib/hadoop-streaming-*.jar

讓環境變數生效:

source ~/.bashrc
echo $stream

建立乙個shell名稱為run.sh來執行：

gedit run.sh

hadoop jar $stream

-file /home/hadoop/wc/reducer.py \

-reducer /home/hadoop/wc/reducer.py \

-input /user/hadoop/input/*.txt \

-output /user/hadoop/wcoutput

理解MapReduce計算構架

用python編寫wordcount程式任務程式wordcount 輸入乙個包含大量單詞的文字檔案輸出檔案中每個單詞及其出現次數頻數並按照單詞字母順序排序，每個單詞和其頻數佔一行，單詞和頻數之間有間隔 1.編寫map函式，reduce函式第一步建立檔案第二步編寫兩個函式 2.將其許可...

理解MapReduce計算構架

用python編寫wordcount程式任務程式wordcount 輸入乙個包含大量單詞的文字檔案輸出檔案中每個單詞及其出現次數頻數並按照單詞字母順序排序，每個單詞和其頻數佔一行，單詞和頻數之間有間隔 1.編寫map函式，reduce函式首先先建立檔案然後編寫兩個函式 2.將其許可權作出...

理解MapReduce計算構架

用python編寫wordcount程式任務程式wordcount 輸入乙個包含大量單詞的文字檔案輸出檔案中每個單詞及其出現次數頻數並按照單詞字母順序排序，每個單詞和其頻數佔一行，單詞和頻數之間有間隔編寫map函式，reduce函式 1 首先建立乙個資料夾 2 將其許可權作出相應修改 3 ...

理解MapReduce計算構架

理解MapReduce計算構架

理解MapReduce計算構架

理解MapReduce計算構架

相關推薦