flume從kafka導資料到hdfs

flume是cloudera提供的乙個高可用的，高可靠的，分布式的海量日誌採集、聚合和傳輸的系統，flume支援在日誌系統中定製各類資料傳送方，用於收集資料；同時，flume提供對資料進行簡單處理，並寫到各種資料接受方（可定製）的能力.

利用flume從kafka導資料到hdfs

配置檔案如下：

flumetohdfs_agent.sources = source_from_kafka flumetohdfs_agent.channels = mem_channel flumetohdfs_agent.sinks = hdfs_sink #auto.commit.enable = true ## kerberos config ## #flumetohdfs_agent.sinks.hdfs_sink.hdfs.kerberosprincipal = flume/[email protected] #flumetohdfs_agent.sinks.hdfs_sink.hdfs.kerberoskeytab = /root/apache-flume-1.6.0-bin/conf/flume.keytab # for each one of the sources, the type is defined flumetohdfs_agent.sources.source_from_kafka.type = org.apache.flume.source.kafka.kafkasource flumetohdfs_agent.sources.source_from_kafka.zookeeperconnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka flumetohdfs_agent.sources.source_from_kafka.topic = itil_topic_4097 #flumetohdfs_agent.sources.source_from_kafka.batchsize = 10000 flumetohdfs_agent.sources.source_from_kafka.groupid = flume4097 flumetohdfs_agent.sources.source_from_kafka.channels = mem_channel # the channel can be defined as follows. flumetohdfs_agent.sinks.hdfs_sink.type = hdfs #flumetohdfs_agent.sinks.hdfs_sink.fileprefix = % flumetohdfs_agent.sinks.hdfs_sink.hdfs.path = hdfs: ## roll every hour (after gz) flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollsize = 0 flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollcount = 0 flumetohdfs_agent.sinks.hdfs_sink.hdfs.rollinterval = 3600 flumetohdfs_agent.sinks.hdfs_sink.hdfs.threadspoolsize = 300 #flumetohdfs_agent.sinks.hdfs_sink.hdfs.codec = gzip #flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype = compressedstream flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype=datastream flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text #specify the channel the sink should use flumetohdfs_agent.sinks.hdfs_sink.channel = mem_channel # each channel's type is defined. flumetohdfs_agent.channels.mem_channel.type = memory # other config values specific to each type of channel(sink or source) # can be defined as well # in this case, it specifies the capacity of the memory channel flumetohdfs_agent.channels.mem_channel.capacity = 100000 flumetohdfs_agent.channels.mem_channel.transactioncapacity = 10000

啟動agent:

./flume-ng agent --conf ../conf/ -n flumetohdfs_agent -f ../conf/flume-conf-4097.properties

**的名字(-n flumetohdfs_agent)必須跟配置檔案裡的名字一致，預設輸出hdfs的檔案格式為sequencefile，無法直接開啟瀏覽，可以設定輸出格式為文字：

flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype=datastream

flumetohdfs_agent.sinks.hdfs_sink.hdfs.writeformat=text

也可以設定壓縮輸出：

flumetohdfs_agent.sinks.hdfs_sink.hdfs.codec = gzip

flumetohdfs_agent.sinks.hdfs_sink.hdfs.filetype = compressedstream

從kafka到hive:

flume從kafka導資料到hdfs

從flume到kafka，日誌收集

從mysql導資料到trafodion

從oracle導資料到mysql

flume從kafka導資料到hdfs

從flume到kafka，日誌收集

從mysql導資料到trafodion

從oracle導資料到mysql

相關推薦