通過Apache Flume向HDFS儲存資料

本筆記基於hadoop2.7.3，apache flume 1.8.0。其中flume source為netcat，flume channel為memory，flume sink為hdfs。

1，配置flume**檔案

配置乙個flume agent**,在此名稱為shaman。配置檔案（netcat-memory-hdfs.conf）如下：

# identify the components on agent shaman: shaman.sources = netcat_s1 shaman.sinks = hdfs_w1 shaman.channels = in-mem_c1 # configure the source: shaman.sources.netcat_s1.type = netcat shaman.sources.netcat_s1.bind = localhost shaman.sources.netcat_s1.port = 44444 # describe the sink: shaman.sinks.hdfs_w1.type = hdfs shaman.sinks.hdfs_w1.hdfs.path = hdfs://localhost:8020/user/root/test shaman.sinks.hdfs_w1.hdfs.writeformat = text shaman.sinks.hdfs_w1.hdfs.filetype = datastream # configure a channel that buffers events in memory: shaman.channels.in-mem_c1.type = memory shaman.channels.in-mem_c1.capacity = 20000 shaman.channels.in-mem_c1.transactioncapacity = 100 # bind the source and sink to the channel: shaman.sources.netcat_s1.channels = in-mem_c1

shaman.sinks.hdfs_w1.channel = in-mem_c1

備註：

hdfs://localhost:8020/user/root/test，其中hdfs://localhost:8020為hadoop配置檔案core-site.xml中

fs.defaultfs屬性的值，root為hadoop的登陸使用者。

2，啟動flume**

bin/flume-ng agent -f agent/netcat-memory-hdfs.conf -n shaman  -dflume.root.logger=debug,console -dorg.apache.flume.log.printconfig=true -dorg.apache.flume.log.rawdata=true

3，開啟telnet客戶端，輸入字母測試

telnet localhost 44444

然後輸入文字

4，檢視hdfs test目錄

hdfs dfs -ls /user/root/test

會發現有新的檔案出現，檔案裡面的內容即是通過telent輸入的字母。

學習資料：

1，《hadoop for dummies》

2，flume 1.8.0 user guide

通過Apache Flume向HDFS儲存資料

通過HTMLExtractor向HTML要資料

通過管道向 hadoop put 檔案

通過jquery的ajax向struts2傳遞引數

通過Apache Flume向HDFS儲存資料

通過HTMLExtractor向HTML要資料

通過管道向 hadoop put 檔案

通過jquery的ajax向struts2傳遞引數

相關推薦