Python進行簡單的MapReduce 1

所有操作，假定hadoop集群已經正常部署。

python原始碼

#!/usr/bin python
import sys
# input comes from stdin (standard input)
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print '%s\\t%s' % (word, 1)

reduce.py

#!/usr/bin python
from operator import itemgetter
import sys
word2count = {}
# input comes from stdin
for line in sys.stdin:
line = line.strip()
word, count = line.split('\\t', 1)
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except valueerror:
# count was not a number, so silently
# ignore/discard this line
pass
sorted_word2count = sorted(word2count.items(), key=itemgetter(0))
for word, count in sorted_word2count:
print '%s\\t%s'% (word, count)

先後儲存在/home/src下，然後，cd到此目錄

在hdfs上建立測試目錄：

lshadoop fs -ls /user/hdfs

mkdir

hadoop fs -mkdir /user/hdfs/test

從本地磁碟copy測試檔案到hdfs

hadoop fs -copufromlocal /home/src/*.txt /user/hdfs/test/

使用streaming.jar執行mapreduce任務

執行結果：

......

14/11/26 12:54:52 info mapreduce.job: map 0% reduce 0%

14/11/26 12:54:59 info mapreduce.job: map 100% reduce 0%

14/11/26 12:55:04 info mapreduce.job: map 100% reduce 100%

14/11/26 12:55:04 info mapreduce.job: job job_1415798121952_0179 completed successfully

......

14/11/26 12:55:04 info streaming.streamjob: output directory: /user/hdfs/test/reducer

......

檢視執行結果集檔案

hadoop fs -ls /user/hdfs/test

......

drwxr-xr-x - root hadoop 0 2014-11-26 12:55 /user/hdfs/test/reducer

......

Python使用logging進行簡單的日誌處理

將日誌內容輸出到日誌檔案和控制台，先導入相關模組。import os import logging import time import sys設定log的資料夾路徑，並判斷log資料夾是否存在，若不存在則建立。project dir os.path.abspath os.path.join os....

python進行簡單方差分析

資料 v1樣品1 樣品2樣品3 樣品4樣品5 data data.drop v1 axis 1 去掉序號那一列 r data.index.size n data.columns.size 列數 s 0sum 0 for i in range 0,r for j in range 0,n s s da...

Mapr與Hbase工作二 HBase的公升級

此頁面包含了描述如何在mapr分布為apache hadoop的hbase的公升級了以下主題移植配置檔案規劃公升級特定版本的注意事項公升級軟體配置群集的新版本在公升級之前，請確保mapr核心軟體的群集上的版本支援hbase的你想公升級到的版本。見hbase的發行說明。mapr的rpm和d...

Python進行簡單的MapReduce 1

Python使用logging進行簡單的日誌處理

python進行簡單方差分析

Mapr與Hbase工作 二 HBase的公升級

相關推薦

Mapr與Hbase工作二 HBase的公升級