Hive 推測執行

在分布式集群環境下，因為程式bug（包括hadoop本身的bug），負載不均衡或者資源分布不均等原因，會造成同乙個作業的多個任務之間執行速度不一致，有些任務的執行速度可能明顯慢於其他任務（比如乙個作業的某個任務進度只有50%，而其他所有任務已經執行完畢），則這些任務會拖慢作業的整體執行進度。

為了避免這種情況發生，hadoop採用了推測執行（speculative execution）機制，它根據一定的法則推測出「拖後腿」的任務，並為這樣的任務啟動乙個備份任務，讓該任務與原始任務同時處理同乙份資料，並最終選用最先成功執行完成任務的計算結果作為最終結果。

hive 同樣可以開啟推測執行

設定開啟推測執行引數：hadoop的mapred-site.xml檔案中進行配置

>
>
mapreduce.map.speculativename
>
>
truevalue
>
>
if true, then multiple instances of some map tasks 
may be executed in parallel.description
>
property
>
>
>
mapreduce.reduce.speculativename
>
>
truevalue
>
>
if true, then multiple instances of some reduce tasks 
may be executed in parallel.description
>
property
>

不過hive本身也提供了配置項來控制reduce-side的推測執行：

>
>
hive.mapred.reduce.tasks.speculative.executionname
>
>
truevalue
>
>
whether speculative execution for reducers should be turned on. description
>
property
>

關於調優這些推測執行變數，還很難給乙個具體的建議。如果使用者對於執行時的偏差非常敏感的話，那麼可以將這些功能關閉掉。

如果使用者因為輸入資料量很大而需要首席執行官時間的map或者reduce task的話，那麼啟動推測執行造成的浪費是非常巨大大。

Hive 推測執行

Hive 優化之推測執行

MapReduce的推測執行（Hive優化）

spark推測執行填坑

Hive 推測執行

Hive 優化之 推測執行

MapReduce的推測執行（Hive優化）

spark推測執行 填坑

相關推薦

Hive 優化之推測執行

spark推測執行填坑