目錄
mapreduce核心思想是分治
以下樣例只涉及基礎學習和少量資料,並不需要連線虛擬機器
以下樣例均可在系統建立的資料夾的part-r-0000中檢視結果
在檔案輸入一定數量單詞,統計各個單詞出現次數
**
package qfnu;
import j**a.io.ioexception;
import org.apache.hadoop.conf.configuration;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.longwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.job;
import org.apache.hadoop.mapreduce.reducer;
import org.apache.hadoop.mapreduce.lib.input.fileinputformat;
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;
text, intwritable> }
}// reducer 元件
class wordcountreducer extends reducer
context.write(key, new intwritable(count)); }
}public class wordcountdriver
}
我們可以在d:/hadooptest/ansofcount的part-r-0000看到
倒排索引是為了更加方便的搜尋
**
package qfnu;
import j**a.io.ioexception;
import org.apache.hadoop.conf.configuration;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.longwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.job;
import org.apache.hadoop.mapreduce.reducer;
import org.apache.hadoop.mapreduce.lib.input.fileinputformat;
import org.apache.hadoop.mapreduce.lib.input.filesplit;
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;
text, text> }}
class invertedindexcombiner extends reducer<
text, text, text, text>
int splitindex = key.tostring().indexof(":");
keyinfo.set(key.tostring().substring(0, splitindex));
valueinfo.set(key.tostring().substring(splitindex+1) + ":" + count);
context.write(keyinfo, valueinfo); }}
class invertedindexreducer extends reducer
valueinfo.set(filelist);
context.write(key, valueinfo); }
}public class invertedindexdriver
}
package qfnu;
import j**a.io.ioexception;
import org.apache.hadoop.conf.configuration;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.longwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.job;
import org.apache.hadoop.mapreduce.reducer;
import org.apache.hadoop.mapreduce.lib.input.fileinputformat;
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;
@override
throws ioexception, interruptedexception
}class dedupreducer extends reducer
}public class dedupdriver
}
package qfnu;
import j**a.io.ioexception;
import j**a.util.comparator;
import j**a.util.treemap;
import org.apache.hadoop.conf.configuration;
import org.apache.hadoop.fs.path;
import org.apache.hadoop.io.intwritable;
import org.apache.hadoop.io.longwritable;
import org.apache.hadoop.io.text;
import org.apache.hadoop.mapreduce.job;
import org.apache.hadoop.mapreduce.reducer;
import org.apache.hadoop.mapreduce.lib.input.fileinputformat;
import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;
text, intwritable>
} }@override
throws ioexception, interruptedexception }}
class topnreducer extends reducer
});@override
protected void reduce(text key, iterablevalues,
reducer.context context) throws ioexception, interruptedexception
} for(integer i : treemap.keyset()) }
}public class topndriver
}
訊號量基礎和兩個經典樣例
1 乙個訊號量能夠初始化為非負值 2 semwait操作能夠使訊號量減1,若訊號量的值為負,則執行semwait的程序被堵塞。否則程序繼續執行。3 semsignal操作使訊號量加1。若訊號量的值小於等於0。則被semwait操作堵塞的程序講被接觸堵塞。ps semwait相應p原語。semsign...
protobuf c應用樣例
autogen.sh configure make make install 根據協議格式生成原始碼與標頭檔案 amessage.proto 檔案內容如下 message amessage 根據amessage.proto 生成c語言標頭檔案與原始碼 protoc c c out amessage....
rapidjson使用樣例
rapidjson預設支援的字元格式是utf 8的,一般中間介面是json檔案的話儲存為utf 8比較通用一些。如果是unicode的需要轉換。但從原始碼中的ch型別看,應該是支援泛型的,具體在用到了可以仔細研究一下。這篇文件中有json解析相關庫的效能比較,rapidjson還是各方面均衡比較突出...