使用flume採集目錄需要啟動hdfs集群
vi spool-hdfs.conf
# name the components on thisagent
a1.sources =r1
a1.sinks =k1
a1.channels =c1
# describe/configure the source
##注意:不能往監控目中重複丟同名檔案
a1.sources.r1.type =spooldir
a1.sources.r1.spooldir = /root/logs2
a1.sources.r1.fileheader = true
# describe the sink
a1.sinks.k1.type =hdfs
a1.sinks.k1.channel =c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%h%m/a1.sinks.k1.hdfs.fileprefix = events-
#控制資料夾的滾動頻率
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundvalue = 10a1.sinks.k1.hdfs.roundunit =minute
#控制檔案的滾動頻率
a1.sinks.k1.hdfs.rollinterval = 3 #時間維度a1.sinks.k1.hdfs.rollsize = 20 #檔案大小維度a1.sinks.k1.hdfs.rollcount = 5 #event數量維度a1.sinks.k1.hdfs.batchsize = 1a1.sinks.k1.hdfs.uselocaltimestamp = true
#生成的檔案型別,預設是sequencefile,可用datastream,則為普通文字
a1.sinks.k1.hdfs.filetype =datastream
# use a channel which buffers events in memory
a1.channels.c1.type =memory
a1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# bind the source and sink to the channel
a1.sources.r1.channels =c1
a1.sinks.k1.channel = c1
mkdir /root/logs2spooldir source 監控指定目錄 如果目錄下有新檔案產生 就採集走
啟動命令:
bin/flume-ng agent -c ./conf -f ./conf/spool-hdfs.conf -n a1 -dflume.root.logger=info,console
vi tail-hdfs.conf
# name the components on thisagent
a1.sources =r1
a1.sinks =k1
a1.channels =c1
# describe/configure the source
a1.sources.r1.type =exec
a1.sources.r1.command = tail -f /root/logs/test.log
a1.sources.r1.channels =c1
# describe the sink
a1.sinks.k1.type =hdfs
a1.sinks.k1.channel =c1
a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%h-%m/a1.sinks.k1.hdfs.fileprefix = events-a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundvalue = 10a1.sinks.k1.hdfs.roundunit =minute
a1.sinks.k1.hdfs.rollinterval = 3a1.sinks.k1.hdfs.rollsize = 20a1.sinks.k1.hdfs.rollcount = 5a1.sinks.k1.hdfs.batchsize = 1a1.sinks.k1.hdfs.uselocaltimestamp = true
#生成的檔案型別,預設是sequencefile,可用datastream,則為普通文字
a1.sinks.k1.hdfs.filetype =datastream
# use a channel which buffers events in memory
a1.channels.c1.type =memory
a1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# bind the source and sink to the channel
a1.sources.r1.channels =c1
a1.sinks.k1.channel = c1
mkdir /root/logs啟動命令
bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1exec source 可以執行乙個shell命令 (tail -f sx.log) 實時採集檔案資料變化
模擬資料生成的腳步:
whiletrue;do date >> /root/logs/test.log;sleep 0.5;done
或 #!/bin/bash
while
true
dodate >> /root/logs/test.log
sleep 1done
Flume採集檔案到HDFS
在flume和hadoop安裝好的情況下 1.遇到的坑 在安裝hadoop時,配置 core site.xml 檔案一定要注意。fs.defaultfs name hdfs master 9000 value property 上述的value值使用的是主機名稱 master 或者ip位址,不能使用...
Hadoop之Flume採集檔案到hdfs
內容如下 定義三大元件的名稱,myagent可以自己定義 myagent.sources source1 myagent.sinks sink1 myagent.channels channel1 配置source元件 myagent.sources.source1.type spooldir 定義...
Flume採集檔案到HDFS(跟蹤檔案內容)
1.配置檔案編寫 在flume下新建tail hdfs.conf pooldir flume中自帶的讀取目錄的source,只要出現新檔案就會被讀走 定義三大元件的名稱 ag1.sources source1 ag1.sinks sink1 ag1.channels channel1 配置sourc...