hive效能調優

原文：

limit 限制調整

-- 因為使用 limit 語句時候，是先執行整個查詢語句，然後再返回部分結果的

set hive.limit.optimize.enable=true;

set hive.limit.row.max.size=10000;

set hive.limit.optimize.limit.file=10;

2.join 優化

。。。3. 本地模式

--hive 嘗試使用本地模式執行查詢，要不然 hive 會使用 mapreduce 來執行其他所有的查詢

set hive.exec.mode.local.auto=true;

4. 並行執行

set hive.exec.parallel=true;

5. 嚴格模式

-- 對分割槽表進行查詢，在 where 子句中沒有加分割槽過濾的話，將禁止提交任務 ( 預設： nonstrict)

set hive.mapred.mode=strict;

注：使用嚴格模式可以禁止 3 種型別的查詢：

（ 1 ）對於分割槽表，不加分割槽字段過濾條件，不能執行

（ 2 ）對於 order by 語句，必須使用 limit 語句。

（ 3 ）限制笛卡爾積的查詢（ join 的時候不使用 on ，而使用 where 的）。

set hive.exec.reducers.max=( 集群總 reduce 槽位個數 *1.5)/( 執行中的查詢的平均個數 )

7.jvm 重用

set mapred.job.reuse.jvm.num.tasks=10; --10 為重用個數

8. 索引

索引可以加快含有 group by 語句的查詢的計算速度

9. 動態分割槽調整

-- 動態分割槽屬性：設定為 true 表示開啟動態分割槽功能（預設為 false ）

hive.exec.dynamic.partition=true;

-- 動態分割槽屬性：設定為 nonstrict, 表示允許所有分割槽都是動態的（預設為 strict ）

-- 設定為 strict ，表示必須保證至少有乙個分割槽是靜態的

hive.exec.dynamic.partition.mode=strict;

hive.exec.max.dynamic.partitions.pernode=100;

-- 動態分割槽屬性：乙個動態分割槽建立語句可以建立的最大動態分割槽個數

hive.exec.max.dynamic.partitions=1000;

-- 動態分割槽屬性：全域性可以建立的最大檔案個數

hive.exec.max.created.files=100000;

-- 控制 datanode 一次可以開啟的檔案個數

-- 這個引數必須設定在 datanode 的 $hadoop_home/conf/hdfs-site.xml 檔案中

dfs.datanode.max.xcievers

8192

10. 推測執行

-- 目的：是通過加快獲取單個 task 的結果以及進行偵測將執行慢的 tasktracker 加入到黑名單的方式來提高整體的任務執行效率

（ 1 ）修改 $hadoop_home/conf/mapred-site.xml 檔案

mapred.map.tasks.speculative.execution

true

mapred.reduce.tasks.speculative.execution

true

（ 2 ）修改 hive 配置

set hive.mapred.reduce.tasks.speculative.execution=true;

11. 單個 mapreduce 中多個 group by

-- 多個 group by 操作組裝到單個 mapreduce 任務中

set hive.multigroupby.singlemr=false;

12. 虛擬列

-- 當 hive 產生了非預期的或 null 的時候，可以通過虛擬列進行診斷，判斷哪行資料出現問題

input__file__name （輸入檔名）

block__offset__inside__file （塊內偏移量）

row__offset__inside__block ( 行偏移量，需要設定 hive.exec.rowoffset=true; 啟用 )

13. 其他引數調優

-- 開啟 cli 提示符前列印出當前所在的資料庫名

set hive.cli.print.current.db=true;

-- 讓 cli 列印出欄位名稱

hive.cli.print.header=true;

-- 提高聚合的效能

set hive.map.aggr=true;

-- 對於簡單的不需要聚合的類似 select from limit n 語句，不需要起 mapreduce job ，直接通過 fetch task 獲取資料

set hive.fetch.task.conversion=more;

Hive效能調優

軟體環境 hive1.2.1 hadoop2.6.4 直接使用hive cli模式執行 1.設定執行引擎 set hive.execution.engine mr set hive.execution.engine spark 如果設定執行引擎為mr，那麼就會呼叫hadoop的maprecude來執...

Hive實戰效能調優

hive是乙個資料倉儲基礎工具在hadoop中用來處理結構化資料。它架構在hadoop之上，總歸為大資料，並使得查詢和分析方便。並提供簡單的sql查詢功能，可以將sql語句轉換為mapreduce任務進行執行。hive 構建在基於靜態批處理的hadoop 之上，hadoop 通常都有較高的延遲並且在...

hive效能調優總結

1.fetch抓取 hive.fetch.task.conversion more 在某些情況下不必要使用mr計算。hive預設是minimal，該屬性修改為more以後，在全域性查詢字段查詢 limit查詢等都不走mapreduce。2.本地模式 hive在進行集群作業時多台機器上協調執行，解決...

hive效能調優

Hive效能調優

Hive實戰效能調優

hive效能調優總結

相關推薦