Flume採集檔案到HDFS

在flume和hadoop安裝好的情況下

1.遇到的坑

在安裝hadoop時，配置 core-site.xml 檔案一定要注意。

fs.defaultfs<
/name>
hdfs:
//master:
9000
<
/value>
<
/property>

上述的value值使用的是主機名稱（master）或者ip位址，不能使用localhost（親測報錯）或者127.0.0.1。

一直不停報錯

2.flume配置檔案

在flume檔案下建立test測試檔案，在test檔案內寫配置檔案

vim test-flume-hdfs.conf

內容為

# name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -f /usr/local/flume/test/test-flume-hdfs.log a1.sources.r1.channels = c1 # describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = hdfs: //（ip或主機名）: 9000 /user/root/test/ %y-%m- %d/%h- %m/a1.sinks.k1.hdfs.fileprefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundvalue = 10a1.sinks.k1.hdfs.roundunit = minute a1.sinks.k1.hdfs.rollinterval = 3a1.sinks.k1.hdfs.rollsize = 20a1.sinks.k1.hdfs.rollcount = 5a1.sinks.k1.hdfs.batchsize = 1a1.sinks.k1.hdfs.uselocaltimestamp = true #生成的檔案型別，預設是sequencefile，可用datastream，則為普通文字 a1.sinks.k1.hdfs.filetype = datastream # use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactioncapacity = 100# bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

3.啟動hadoop

sbin/start-all.sh

4.啟動flume

bin/flume-ng agent -c conf -f test/test-flume-hdfs.conf -n a1 -dflume.root.logger=info,console

5.寫測試檔案

在flume的test檔案下建立test-flume-hdfs.log檔案並寫入內容，儲存

test flume-hdfs

hello flume hdfs

結果

Flume採集檔案到HDFS

Flume採集檔案到HDFS（跟蹤檔案內容）

Flume採集目錄及檔案到HDFS案例

Flume 採集資料到hdfs 小檔案優化

Flume採集檔案到HDFS

Flume採集檔案到HDFS（跟蹤檔案內容）

Flume採集目錄及檔案到HDFS案例

Flume 採集資料到hdfs 小檔案優化

相關推薦