hive常用引數

hive 引數

hive.exec.max.created.files

說明：所有hive執行的map與reduce任務可以產生的檔案的和

預設值:100000

hive.exec.dynamic.partition

說明：是否為自動分割槽

預設值：false

hive.mapred.reduce.tasks.speculative.execution

說明：是否開啟推測執行

預設值：true

hive.input.format

說明：hive預設的input format

預設值： org.apache.hadoop.hive.ql.io.combinehiveinputformat

如果有問題可以使用org.apache.hadoop.hive.ql.io.hiveinputformat

hive.exec.counters.pull.interval

說明：hive與jobtracker拉取counter資訊的時間

預設值：1000ms

hive.script.recordreader

說明：使用指令碼時預設的讀取類

預設值： org.apache.hadoop.hive.ql.exec.textrecordreader

hive.script.recordwriter

說明：使用指令碼時預設的資料寫入類

預設值： org.apache.hadoop.hive.ql.exec.textrecordwriter

hive.mapjoin.check.memory.rows

說明：記憶體裡可以儲存資料的行數

預設值： 100000

hive.mapjoin.smalltable.filesize

說明：輸入小表的檔案大小的閥值，如果小於該值，就採用普通的join

預設值： 25000000

hive.auto.convert.join

說明：是不是依據輸入檔案的大小，將join轉成普通的map join

預設值： false

hive.mapjoin.followby.gby.localtask.max.memory.usage

說明：map join做group by 操作時，可以使用多大的記憶體來儲存資料，如果資料太大，則不會儲存在記憶體裡

預設值：0.55

hive.mapjoin.localtask.max.memory.usage

說明：本地任務可以使用記憶體的百分比

預設值： 0.90

hive.heartbeat.interval

說明：在進行mapjoin與過濾操作時，傳送心跳的時間

預設值1000

hive.merge.size.per.task

說明：合併後檔案的大小

預設值： 256000000

hive.mergejob.maponly

說明：在只有map任務的時候合併輸出結果

預設值： true

hive.merge.mapredfiles

預設值：在作業結束的時候是否合併小檔案

說明： false

hive.merge.mapfiles

說明：map-only job是否合併小檔案

預設值：true

hive.hwi.listen.host

說明：hive ui 預設的host

預設值：0.0.0.0

hive.hwi.listen.port

說明：ui監聽埠

預設值：9999

hive.exec.parallel.thread.number

說明：hive可以並行處理job的執行緒數

預設值：8

hive.exec.parallel

說明：是否並行提交任務

預設值：false

hive.exec.compress.output

說明：輸出使用壓縮

預設值： false

hive.mapred.mode

說明： mapreduce的操作的限制模式，操作的執行在該模式下沒有什麼限制

預設值： nonstrict

hive.join.cache.size

說明： join操作時，可以存在記憶體裡的條數

預設值： 25000

hive.mapjoin.cache.numrows

說明： mapjoin 存在記憶體裡的資料量

預設值：25000

hive.join.emit.interval

說明：有連線時hive在輸出前，快取的時間

預設值： 1000

hive.optimize.groupby

說明：在做分組統計時，是否使用bucket table

預設值： true

hive.fileformat.check

說明：是否檢測檔案輸入格式

預設值：true

hive.metastore.client.connect.retry.delay

說明： client 連線失敗時,retry的時間間隔

預設值：1秒

hive.metastore.client.socket.timeout

說明: client socket 的超時時間

預設值：20秒

mapred.reduce.tasks

預設值：-1

說明：每個任務reduce的預設值

-1 代表自動根據作業的情況來設定reduce的值

hive.exec.reducers.bytes.per.reducer

預設值： 1000000000 （1g）

說明：每個reduce的接受的資料量

如果送到reduce的資料為10g,那麼將生成10個reduce任務

hive.exec.reducers.max

預設值：999

說明： reduce的最大個數

hive.exec.reducers.max

預設值：999

說明： reduce的最大個數

hive.metastore.warehouse.dir

預設值：/user/hive/warehouse

說明：預設的資料庫存放位置

hive.default.fileformat

預設值：textfile

說明：預設的fileformat

hive.map.aggr

預設值：true

說明： map端聚合，相當於combiner

hive.exec.max.dynamic.partitions.pernode

預設值：100

說明：每個任務節點可以產生的最大的分割槽數

hive.exec.max.dynamic.partitions

預設值：1000

說明：預設的可以建立的分割槽數

hive.metastore.server.max.threads

預設值：100000

說明： metastore預設的最大的處理執行緒數

hive.metastore.server.min.threads

預設值：200

說明： metastore預設的最小的處理執行緒數

hive常用引數調優

決定是否可以在 map 端進行聚合操作開啟資料傾斜時的負載均衡設定所提交 job 的 reduer 的個數 hive map join 所快取的行數。決定 hive 是否應該自動地根據輸入檔案大小，在本地執行需要合併的小檔案群的平均大小，預設 16 m。是否根據輸入小表的大小，自動將 redu...

hive的常用互動命令 hivevar 引數傳遞

命令列模式，或者說目錄模式，可以使用hive 執行命令。選項說明 e 執行短命令 f 執行檔案適合指令碼封裝 s 安靜模式，不顯示mr的執行過程 hivevar 傳引數專門提供給使用者自定義變數。hiveconf 傳引數，包括了hive site.xml中配置的hive全域性變數。例子1 hiv...

Hive add 8 Hive 常用引數配置

記錄 hive 常用引數的配置讓 hive 自動使用 hadoop 的本地模式執行作業，提公升處理效能適合小檔案，一般用於測試 set hive.exec.mode.local.auto true 輸入檔案大小低於此值會進入本地模式 set hive.exec.mode.local.auto.i...

hive常用引數

hive常用引數調優

hive的常用互動命令 hivevar 引數傳遞

Hive add 8 Hive 常用引數配置

相關推薦