Flume監聽上傳Hive日誌檔案到HDFS 02

#把agent起個名叫a2,sources叫r2,sinks叫k2.hdfs,channels叫c2

a2.sources = r2

a2.sinks = k2

a2.channels = c2

#監聽資料**為本地的4444埠

# describe/configure the source

a2.sources.r2.type = exec

a2.sources.r2.command = tail -f /opt/modules/hive-0.13.1-cdh5.3.6/logs/hive.log

#/bin/bash -c echo `date` 可以寫成 echo /bin/bash -c date 就是它後面跟乙個可執行指令碼,來執行上面的監控日誌的命令

a2.sources.r2.shell = /bin/bash -c

#logger是流傳輸過來解碼後的資料

# describe the sink

a2.sinks.k2.type = hdfs

a2.sinks.k2.hdfs.path = hdfs:

#上傳檔案的字首

a2.sinks.k2.hdfs.fileprefix = events-hive-

#是否按照時間滾動資料夾

a2.sinks.k2.hdfs.round = true

#多長時間建立乙個新的資料夾 1小時

a2.sinks.k2.hdfs.roundvlaue = 1

#重新定義時間單位

a2.sinks.k2.hdfs.roundunit = hour

#是否使用本地時間戳

a2.sinks.k2.hdfs.uselocaltimestamp = true

#積攢多個event的時候才flush到hdfs上

a2.sinks.k2.hdfs.batchsize = 1000

#設定檔案型別,可支援壓縮

a2.sinks.k2.hdfs.filetype = datastream

#多久生成乙個新檔案 10分鐘

a2.sinks.k2.hdfs.rollinterval = 600

#設定檔案塊多大.即使沒到600秒,但是達到這個大小了也生成新檔案

a2.sinks.k2.hdfs.rollsize = 134217700

#檔案的滾動與event數量無關

a2.sinks.k2.hdfs.rollcount = 0

#最小冗餘數

a2.sinks.k2.hdfs.minblockreplicas = 1

#channels階段以記憶體的形式儲存資料 event數量100

# use a channel which buffers events in memory

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactioncapacity = 100

#把source和sink和channel對接 source可以對接多個channels sinks只能對接乙個channel

Flume監聽檔案並上傳到hdfs

監聽hive日誌，並上傳到hdfs中 flume yymmddhh 目錄下，檔案字首為logs 每乙個小時新建重新建立乙個資料夾，每接收10m資料落地一次，當資料不足10m時15分鐘落地一次匯入以下jar包到flume路徑下的lib裡，裡為相應版本，在hadoop路徑下share hadoop下都...

flume 增量上傳日誌檔案到HDFS中

1.採集日誌檔案時乙個很常見的現象採集需求比如業務系統使用log4j生成日誌，日誌內容不斷增加，需要把追加到日誌檔案中的資料實時採集到hdfs中。1.1.根據需求，首先定義一下3大要素採集源，即source 監控日誌檔案內容更新 exec tail f file 下沉目標，即sink hdfs...

flume 模擬將日誌內容寫入到 hive中

注這個模擬實際上也相當於是將 flume 日誌輸出到hdfs中，然後再通過hive外部表關聯hdfs對應的路徑而已。name the components on this agent a1.sources r1 a1.sinks k1 a1.channels c1 describe configu...

Flume監聽上傳Hive日誌檔案到HDFS 02

Flume監聽檔案並上傳到hdfs

flume 增量上傳日誌檔案到HDFS中

flume 模擬將日誌內容寫入到 hive中

相關推薦