mapreduce程式設計模型之partitioner

partitioner是乙個mapreduce的頂級抽象類用於決定map輸出的kv鍵值對如何按照job.setnumberruce(n)中設定的n個數進行分割槽每個kv落在哪個分割槽之中以及如何保證相同的key落在同乙個分割槽中只有相同的key落在同乙個分割槽中才能保證這個相同的key可以被同乙個reducer處理

上**

public static class mypartitioner extends partitioner
}

這是定義了乙個partitoner 這個partitioner定義了只有key為hello的時候是落在0分割槽其餘的都落在1分割槽執行測試結果卻是如此

hdfs@yksp005206:/home/jumpserver$ hadoop fs -cat /test/wc/output/part-r-00000

hellovalue hello,hello,hello,

hdfs@yksp005206:/home/jumpserver$ hadoop fs -cat /test/wc/output/part-r-00001

hellpvalue hellp,

hivevalue hive,

kylinvalue kylin,

sparkvalue spark,

worldvalue world,

partitioner的預設實現是hashpartitioner

(key.hashcode() & integer.max_value) % numreducetasks;

這個的意思是對key進行取hashcode 然後對n進行取餘

MapReduce程式設計模型

計算採用一組輸入鍵值對，並產生一組輸出鍵值對。mapreduce庫的使用者將計算表達為兩個函式 map和reduce。input1 map a,1 b,1 c,1 input2 map b,1 input3 map a,1 c,1 reduce c,2 reduce b,2 reduce a,2...

MapReduce 程式設計模型

mapreduce 簡介 mapreduce 本身是一種支援並行運算的程式設計模型思想這個程式設計模型分為兩個階段 map 階段和 reduce 階段。hadoop 的 mapreduce 框架 hadoop 的 mapreduce 是實現 mapreduce 程式設計模型的乙個分布式計算框架，...

MapReduce 程式設計模型

mapreduce 是一種簡化平行計算的程式設計模型，用於大資料量的計算。它的核心思想是分散任務，彙總結果將大規模資料集的操作分發給乙個主節點管理下的各個子節點共同完成，然後整合各個子節點的中間結果，從而得到最終結果。mapreduce的優點 1 便於程式設計 mapreduce 只需簡單地實現...

mapreduce程式設計模型之partitioner

MapReduce程式設計模型

MapReduce 程式設計模型

MapReduce 程式設計模型

相關推薦