mapreduce程式呼叫各個類的功能

**：

1、map類

protected void map(object key, value value, context context)
throws ioexception, interruptedexception

2、reducer類

reducer類繼承了類庫中的reducer，原型為reducer，reduce類除了reduce方法不一樣外，其他跟map均一樣，功能也相同。其reduce方法為：

protected void reduce(text key, interablevalues, context context)
throws ioexception, interruptedexception            
}

3、mapreduce驅動

簡單來說就是main函式中的**，一般情況下包括：

configuration conf = new configuration();
//獲取輸入輸出檔案路徑
string otherargs = new genericoptionsparser(conf,args).getremainingargs();
if(otherargs.length != 2)
job job = new job(conf,"dedup");
job.setjarbyclass(dedup.class);          //主類                                      
job.setcombinerclass(reduce.class);  //job合成類
job.setreducerclass(reduce.class);    //reduce類
job.setoutputkeyclass(text.class);      //設定job輸出資料的key類
job.setoutputvalueclass(text.class);   //設定job輸出資料的value類
fileinputformat.addinputpath(job, new path(otherargs[0]));     //檔案輸入
fileoutputformat.setoutputpath(job, new path(otherargs[1])); //檔案輸出
system.exit(job.waitforcompletion(true) ? 0 : 1);
}

其實這裡還包括乙個maprecude最小驅動即之呼叫minimapreducedriver類，

job job = new job(conf,"dedup");
job.setjarbyclass(dedup.class);
fileinputformat.addinputpath(job, new path(otherargs[0]));
fileoutputformat.setoutputpath(job, new path(otherargs[1]));
system.exit(job.waitforcompletion(true) ? 0 : 1);

4、inputformat介面

inputformat類的層次結構如下所示。textinputformat是inputformat的預設實現方式，對輸入資料中沒有明確key-value時很有效，其返回的key表示這行資料的偏移量，value為行的內容。

5、inputsplit類

預設情況下，fileinputformat及其子類以64mb（建議split大小與此相同）為基數拆分檔案。通過以塊形式處理檔案，可以讓多個map任務並行地操作乙個檔案。對於大檔案，就會極大提公升效能。map的輸入是乙個個的輸入分片，即為inputsplits。

inputsplit的子類有filesplit和combinefilesplit。兩者均包含檔案路徑、分片開始位置、分片大小、儲存分片資料的host列表。但combinefilesplit是針對小檔案，其將很多小檔案包在乙個inputsplit內，這樣就能處理很多小檔案了。

針對某些檔案不可切分，則可通過兩種方式完成，第一種為將檔案最小分片大小設定為大於檔案大小，第二種方法為使用fileinputformat的子類，並過載issplitable方法，把返回值設定為false。

6、recordreader類

inputsplit定義了如何切分工作，recordreader類則定義了如何載入資料並轉換為適合map方法讀取的key-value對。其預設輸入格式為textinputformat。

7、outputformat類

與inputformat相似，其大多數繼承自fileoutformat，但nulloutputformat和dboutputformat除外。其預設格式為textoutputformat。outputformat提供了對recordwriter的實現，從而指定如何序列化資料。recordwriter類可以處理包含單個鍵值對的作業，並將結果寫入outputformat已準備好的位子中。recordwriter主要通過write和close兩個函式實現。write函式從mapreduce作業中取出鍵值對，並將其位元組寫入磁碟。close函式會關閉hadoop到輸出檔案的資料流。

outputformat的層次結構圖如下：

8、recordwriter類

linerecordwriter是預設使用的recordwriter，寫入內容包括：key的位元組，乙個用以定界的製表符，value的位元組，乙個換行符。

mapreduce程式呼叫各個類的功能

MapReduce程式呼叫第三方Jar包的方式

Eclipse 除錯Mapreduce程式（2）

MapReduce程式執行過程

mapreduce程式呼叫各個類的功能

MapReduce程式呼叫第三方Jar包的方式

Eclipse 除錯Mapreduce程式 （2）

MapReduce程式執行過程

相關推薦

Eclipse 除錯Mapreduce程式（2）