資料中颱之flume

資料的同步的ods層，為離線同步和實時同步，離線同步可以用flinkx，datax（關聯式資料庫->hive））。而實時同步可以用flume(kafa->hive)。

實時同步到hive後，能使用脫機數倉加工當天資料需求（比如當天每15分鐘）。

flume先同步到hive的表的分割槽路徑下，再載入到hive表中。

先在flume的conf目錄下新建乙個檔案 example.conf

#-------- flume sources、channels、sinks別名----------------- # source的名字 a1.sources = kafkasource # channels的名字，建議按照type來命名 a1.channels = memorychannel # sink的名字，建議按照目標來命名 a1.sinks = hdfssink # 指定source使用的channel名字 a1.sources.kafkasource.channels = memorychannel # 指定sink需要使用的channel的名字,注意這裡是channel a1.sinks.hdfssink.channel = memorychannel #-------- kafkasource相關配置----------------- # 定義訊息源型別 a1.sources.kafkasource.type = org.apache.flume.source.kafka.kafkasource # 定義kafka所在zk的位址 a1.sources.kafkasource.zookeeperconnect = kafka.port # 配置消費的kafka topic a1.sources.kafkasource.topic = flume # 配置消費者組的id a1.sources.kafkasource.groupid = flume1 # 消費超時時間,參照如下寫法可以配置其他所有kafka的consumer選項。注意格式從kafka.***開始是consumer的配置屬性 a1.sources.kafkasource.kafka.consumer.timeout.ms = 100 #------- memorychannel相關配置------------------------- # channel型別 a1.channels.memorychannel.type = memory # channel儲存的事件容量 a1.channels.memorychannel.capacity=1000 # 事務容量 a1.channels.memorychannel.transactioncapacity=1000 #---------hdfssink 相關配置------------------ a1.sinks.hdfssink.type = hdfs a1.sinks.hdfssink.hdfs.fileprefix = log_%y%m%d_%h%m%s a1.sinks.hdfssink.hdfs.filesuffix = .lzo #寫入hdfs檔案塊的最小副本數 a1.sinks.hdfssink.hdfs.minblockreplicas=1 #是否啟用時間上的」捨棄」 a1.sinks.hdfssink.hdfs.round = true a1.sinks.hdfssink.hdfs.roundvalue = 15 a1.sinks.hdfssink.hdfs.roundunit = minute #檔案格式，包括：sequencefile, datastream,compressedstream a1.sinks.hdfssink.hdfs.filetype=datastream #寫sequence檔案的格式。包含：text, writable（預設） a1.sinks.hdfssink.hdfs.writeformat=text #當臨時檔案達到該大小（單位：bytes）時，滾動成目標檔案 a1.sinks.hdfssink.hdfs.rollsize=0 #hdfs sink間隔多長將臨時檔案滾動成最終目標檔案，單位：秒 a1.sinks.hdfssink.hdfs.rollinterval=60 #當events資料達到該數量時候，將臨時檔案滾動成目標檔案 a1.sinks.hdfssink.hdfs.rollcount=0 #執行hdfs操作的超時時間（單位：毫秒）；

a1.sinks.hdfssink.hdfs.calltimeout = 60000

./bin/flume-ng agent -c conf -f conf/example.conf -n a1 -dflume.root.logger=info,console

flume引數參考

flume hdfs參考

資料中颱之flume

Flume 之資料寫入hdfs

中颱及資料中臺

大資料系列之Flume 一

資料中颱之flume

Flume 之資料寫入hdfs

中颱及資料中臺

大資料系列之Flume 一

相關推薦