第乙個MapreducerDriver跑起來

注：由hadoop權威指南開始的

mapreducer的用途是資料的儲存和分析，就像lucene一樣要想有個完整的可以執行起來的搜尋引擎肯定要構建索引，然後根據client端的需求進行資料分析一樣。

@override

public void map(longwritable key, text value,

outputcollectoroutput, reporter reporter)

throws ioexception

}2，reducer

這裡reducer的引數與上面的解釋相似，這裡需要對資料進行分析，也就是說於業務相關了。

public class maxtemperaturereducer extends mapreducebase

implements reducer

system.err.println("key= " + key + " maxvalue=" + maxvalue);

output.collect(key, new intwritable(maxvalue));}}

3，driver

public int run(string args) throws exception

jobconf conf = new jobconf(this.getconf(), this.getclass());

//必須為目錄，檔案系統會到相應的目錄讀取資料，但是目錄不能遞迴

fileinputformat.addinputpath(conf, new path(args[0]));

//必須為目錄，且不能存在由hadoop生成，recuder聲稱的資料會儲存在特定規則檔案中

fileoutputformat.setoutputpath(conf, new path(args[1]));

conf.setoutputkeyclass(text.class);

conf.setoutputvalueclass(intwritable.class);

conf.setcombinerclass(maxtemperaturereducer.class);

conf.setreducerclass(maxtemperaturereducer.class);

jobclient.runjob(conf);

return 0;

}public static void main(string args)throws exception

執行的時候將class檔案拷貝到[b]hadoop_home[/b]目錄下，然後在命令列執行

$hadoop ***.***.map*driver -fs fs:/// -jt local in out

以上表示在本地檔案系統（當然使用hdfs也是可以的）模式下，執行mpa*driver，讀取目錄in中的檔案，將結果資料儲存到out（out目錄由hadoop生成，否則會報告fileexist錯誤）目錄中

hadoop com.awen.mapreduce.maxtemperaturedriver -fs file:/// -jt local in out

當然以上命令也可以有另一種形式：就是通過hadoop *** -conf ***命令覆蓋configuration

out目錄存在的話：

exception in thread "main" org.apache.hadoop.mapred.filealreadyexist***ception: output directory file:/opt/hadoop-0.20.2/out already exists

下面將檔案系統修改為hdfs檔案系統

首先從本地檔案系統中將資料拷貝到hdfs系統中的當前使用者下的in目下的temperature檔案中

hadoop fs -copyfromlocal /opt/hadoop*/in/temperature in/temperature

執行hadoop com.awen.mapreduce.maxtemperaturedriver in out

這次會發現在本地偽分布式的模式下用hdfs檔案系統的速度執行起來比本地檔案系統要滿很多很多……那麼這個問題只能在hdfs部分分析了，呵呵

python第乙個程式設計第乙個 Python 程式

簡述安裝完 python 後，windows 中開始選單或安裝目錄下就會有 idle 開發 python 程式的基本 ide 整合開發環境幫助手冊模組文件等。linux 中只需要在命令列中輸入 python 命令即可啟動互動式程式設計。互動式程式設計互動式程式設計不需要建立指令碼檔案，是...

第乙個部落格

我不知道為什麼我在csdn上創了乙個賬號，又開通了部落格。也許我不是名人，也許幻想著成為名人。在這裡我不會給任何人許諾，這個部落格可能有乙個博文有兩個博文或者會有很多很多很多。不過讓我有個大膽的猜想，如果這個部落格在今後有很多很多自己寫的博文，說明我成功了在自己眼裡也說明這個方法時正...

第乙個爬蟲

很多人學習python的目的就是為了學習能夠實現爬蟲的功能，這裡，我使用了scrapy框架來實現了乙個簡單的爬蟲功能,這裡我簡單的介紹一下scrapy專案的建立，和執行。1，第一步是安裝scrapy，我相信到了這一步，大多數人都已經會安裝第三方庫檔案了，這裡主要是使用命令pip install sc...

第乙個MapreducerDriver跑起來

python第乙個程式設計 第乙個 Python 程式

第乙個部落格

第乙個爬蟲

相關推薦

python第乙個程式設計第乙個 Python 程式