MapReduce經典樣例

2022-09-10 04:24:13 字數 3888 閱讀 4887

目錄

mapreduce核心思想是分治

以下樣例只涉及基礎學習和少量資料,並不需要連線虛擬機器

以下樣例均可在系統建立的資料夾的part-r-0000中檢視結果

在檔案輸入一定數量單詞,統計各個單詞出現次數

**

package qfnu;

import j**a.io.ioexception;

import org.apache.hadoop.conf.configuration;

import org.apache.hadoop.fs.path;

import org.apache.hadoop.io.intwritable;

import org.apache.hadoop.io.longwritable;

import org.apache.hadoop.io.text;

import org.apache.hadoop.mapreduce.job;

import org.apache.hadoop.mapreduce.reducer;

import org.apache.hadoop.mapreduce.lib.input.fileinputformat;

import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;

text, intwritable> }

}// reducer 元件

class wordcountreducer extends reducer

context.write(key, new intwritable(count)); }

}public class wordcountdriver

}

我們可以在d:/hadooptest/ansofcount的part-r-0000看到

倒排索引是為了更加方便的搜尋

**

package qfnu;

import j**a.io.ioexception;

import org.apache.hadoop.conf.configuration;

import org.apache.hadoop.fs.path;

import org.apache.hadoop.io.intwritable;

import org.apache.hadoop.io.longwritable;

import org.apache.hadoop.io.text;

import org.apache.hadoop.mapreduce.job;

import org.apache.hadoop.mapreduce.reducer;

import org.apache.hadoop.mapreduce.lib.input.fileinputformat;

import org.apache.hadoop.mapreduce.lib.input.filesplit;

import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;

text, text> }}

class invertedindexcombiner extends reducer<

text, text, text, text>

int splitindex = key.tostring().indexof(":");

keyinfo.set(key.tostring().substring(0, splitindex));

valueinfo.set(key.tostring().substring(splitindex+1) + ":" + count);

context.write(keyinfo, valueinfo); }}

class invertedindexreducer extends reducer

valueinfo.set(filelist);

context.write(key, valueinfo); }

}public class invertedindexdriver

}

package qfnu;

import j**a.io.ioexception;

import org.apache.hadoop.conf.configuration;

import org.apache.hadoop.fs.path;

import org.apache.hadoop.io.intwritable;

import org.apache.hadoop.io.longwritable;

import org.apache.hadoop.io.text;

import org.apache.hadoop.mapreduce.job;

import org.apache.hadoop.mapreduce.reducer;

import org.apache.hadoop.mapreduce.lib.input.fileinputformat;

import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;

@override

throws ioexception, interruptedexception

}class dedupreducer extends reducer

}public class dedupdriver

}

package qfnu;

import j**a.io.ioexception;

import j**a.util.comparator;

import j**a.util.treemap;

import org.apache.hadoop.conf.configuration;

import org.apache.hadoop.fs.path;

import org.apache.hadoop.io.intwritable;

import org.apache.hadoop.io.longwritable;

import org.apache.hadoop.io.text;

import org.apache.hadoop.mapreduce.job;

import org.apache.hadoop.mapreduce.reducer;

import org.apache.hadoop.mapreduce.lib.input.fileinputformat;

import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;

text, intwritable>

} }@override

throws ioexception, interruptedexception }}

class topnreducer extends reducer

});@override

protected void reduce(text key, iterablevalues,

reducer.context context) throws ioexception, interruptedexception

} for(integer i : treemap.keyset()) }

}public class topndriver

}

訊號量基礎和兩個經典樣例

1 乙個訊號量能夠初始化為非負值 2 semwait操作能夠使訊號量減1,若訊號量的值為負,則執行semwait的程序被堵塞。否則程序繼續執行。3 semsignal操作使訊號量加1。若訊號量的值小於等於0。則被semwait操作堵塞的程序講被接觸堵塞。ps semwait相應p原語。semsign...

protobuf c應用樣例

autogen.sh configure make make install 根據協議格式生成原始碼與標頭檔案 amessage.proto 檔案內容如下 message amessage 根據amessage.proto 生成c語言標頭檔案與原始碼 protoc c c out amessage....

rapidjson使用樣例

rapidjson預設支援的字元格式是utf 8的,一般中間介面是json檔案的話儲存為utf 8比較通用一些。如果是unicode的需要轉換。但從原始碼中的ch型別看,應該是支援泛型的,具體在用到了可以仔細研究一下。這篇文件中有json解析相關庫的效能比較,rapidjson還是各方面均衡比較突出...