Hive學習筆記 Hive 引數

第一部分：hive 引數

hive.exec.max.created.files

•說明：所有hive執行的map與reduce任務可以產生的檔案的和

•預設值:100000

hive.exec.dynamic.partition

•說明：是否為自動分割槽

•預設值：false

hive.mapred.reduce.tasks.speculative.execution

•說明：是否開啟推測執行

•預設值：true

hive.input.format

•說明：hive預設的input format

•預設值： org.apache.hadoop.hive.ql.io.combinehiveinputformat

•如果有問題可以使用org.apache.hadoop.hive.ql.io.hiveinputformat

hive.exec.counters.pull.interval

•說明：hive與jobtracker拉取counter資訊的時間

•預設值：1000ms

hive.script.recordreader

•說明：使用指令碼時預設的讀取類

•預設值： org.apache.hadoop.hive.ql.exec.textrecordreader

hive.script.recordwriter

•說明：使用指令碼時預設的資料寫入類

•預設值： org.apache.hadoop.hive.ql.exec.textrecordwriter

hive.mapjoin.check.memory.rows

•說明：記憶體裡可以儲存資料的行數

•預設值： 100000

hive.mapjoin.smalltable.filesize

•說明：輸入小表的檔案大小的閥值，如果小於該值，就採用普通的join

•預設值： 25000000

hive.auto.convert.join

•說明：是不是依據輸入檔案的大小，將join轉成普通的map join

•預設值： false

hive.mapjoin.followby.gby.localtask.max.memory.usage

•說明：map join做group by 操作時，可以使用多大的記憶體來儲存資料，如果資料太大，則不會儲存在記憶體裡

•預設值：0.55

hive.mapjoin.localtask.max.memory.usage

•說明：本地任務可以使用記憶體的百分比

•預設值： 0.90

hive.heartbeat.interval

•說明：在進行mapjoin與過濾操作時，傳送心跳的時間

•預設值1000

hive.merge.size.per.task

•說明：合併後檔案的大小

•預設值： 256000000

hive.mergejob.maponly

•說明：在只有map任務的時候合併輸出結果

•預設值： true

hive.merge.mapredfiles

•預設值：在作業結束的時候是否合併小檔案

•說明： false

hive.merge.mapfiles

•說明：map-only job是否合併小檔案

•預設值：true

hive.hwi.listen.host

•說明：hive ui 預設的host

•預設值：0.0.0.0

hive.hwi.listen.port

•說明：ui監聽埠

•預設值：9999

hive.exec.parallel.thread.number

•說明：hive可以並行處理job的執行緒數

•預設值：8

hive.exec.parallel

•說明：是否並行提交任務

•預設值：false

hive.exec.compress.output

•說明：輸出使用壓縮

•預設值： false

hive.mapred.mode

•說明： mapreduce的操作的限制模式，操作的執行在該模式下沒有什麼限制

•預設值： nonstrict

hive.join.cache.size

•說明： join操作時，可以存在記憶體裡的條數

•預設值： 25000

hive.mapjoin.cache.numrows

•說明： mapjoin 存在記憶體裡的資料量

•預設值：25000

hive.join.emit.interval

•說明：有連線時hive在輸出前，快取的時間

•預設值： 1000

hive.optimize.groupby

•說明：在做分組統計時，是否使用bucket table

•預設值： true

hive.fileformat.check

•說明：是否檢測檔案輸入格式

•預設值：true

hive.metastore.client.connect.retry.delay

•說明： client 連線失敗時,retry的時間間隔

•預設值：1秒

hive.metastore.client.socket.timeout

•說明: client socket 的超時時間

•預設值：20秒

mapred.reduce.tasks

•預設值：-1

•說明：每個任務reduce的預設值

-1 代表自動根據作業的情況來設定reduce的值

hive.exec.reducers.bytes.per.reducer

•預設值： 1000000000 （1g）

•說明：每個reduce的接受的資料量

如果送到reduce的資料為10g,那麼將生成10個reduce任務

hive.exec.reducers.max

•預設值：999

•說明： reduce的最大個數

hive.exec.reducers.max

•預設值：999

•說明： reduce的最大個數

hive.metastore.warehouse.dir

•預設值：/user/hive/warehouse

•說明：預設的資料庫存放位置

hive.default.fileformat

•預設值：textfile

•說明：預設的fileformat

hive.map.aggr

•預設值：true

•說明： map端聚合，相當於combiner

hive.exec.max.dynamic.partitions.pernode

•預設值：100

•說明：每個任務節點可以產生的最大的分割槽數

hive.exec.max.dynamic.partitions

•預設值：1000

•說明：預設的可以建立的分割槽數

hive.metastore.server.max.threads

•預設值：100000

•說明： metastore預設的最大的處理執行緒數

hive.metastore.server.min.threads

•預設值：200

•說明： metastore預設的最小的處理執行緒數

Hive學習筆記 Hive概述

1.1 資料倉儲可以利用資料倉儲來儲存我們的資料，但是資料倉儲有別於我們常見的一般資料庫。資料倉儲是乙個面向主題的整合的不可更新的隨時間不變化的資料整合，它用於支援企業或組織的決策分析處理。物件導向的倉庫中的資料是按照一定的主題進行組織的。主題即使用者使用資料倉儲進行決策時所關心的重點方面...

hive學習筆記

在使用hive進行開發時，我們往往需要獲得乙個已存在hive表的建表語句 ddl 然而hive本身並沒有提供這樣乙個工具。要想還原建表ddl就必須從元資料入手，我們知道，hive的元資料並不存放在hdfs上，而是存放在傳統的rdbms中，典型的如mysql，derby等，這裡我們以mysql為元資料...

hive學習筆記

今年剛畢業的我就進入了資料探勘這行，沒過多久開始使用hadoop寫了零星幾個mapreduce，現在開始使用hive，並且在可預見的未來若干一段時間內還會一直用，可是網上關於hive的文章太雜亂，所以現在就讀過的文章和知識點進行梳理和記錄，希望經過一段時間的積累和學習後能夠將hive歸檔，把hive...

Hive學習筆記 Hive 引數

Hive學習筆記 Hive概述

hive學習筆記

hive學習筆記

相關推薦