Hive知識之優化技巧

1、用group by替換distinct去重

select user_name from trade group

by user_name;

2、使用mapjoin

select
/*+ mapjoin(table_a)*/
,		a.*,
b.*from table_a a 
join table_b b 
on a.id = b.id

3、使用distinct+ union all代替union

select
count
(distinct*)
from
(select order_id,user_id,order_type from orders where order_type=
'0'union
allselect order_id,user_id,order_type from orders where order_type=
'1'union
allselect order_id,user_id,order_type from orders where order_type=
'1') a;

4、聚合分組時視情況使用grouping運算子

5、使用 union all時可以開啟併發執行

6、使用函式進行行轉列、列轉行

7、表連線優化

8、過濾優化

9、解決資料傾斜

9.1 資料傾斜的表現

9.2 資料傾斜的原因及其解決辦法

（1）空值產生的資料傾斜

select.. .from.. .ajoin.. .bon a.user_id = b.user_id and a.user_id is

notnull

（2）大小表連線（其中一張表很大，另一張表非常小）

select
/*+ mapjoin(table_a)*/
,		b.
*from table_a a 
join table_b b 
on a.id = b.id

（3）兩個表連線條件的字段資料型別不一致

select..
.from..
.ajoin..
.bon a.user_id = cast(b.user_id as
int)

Hive小技巧及優化

查詢除了ds 和 hr 之外的所有列 select ds hr from sales 修改表生命週期 odps alter table table name set lifecycle days 正則匹配匹配除 n 之外的任何單個字元。要匹配包括 n 在內的任何字元，請使用像 n 的模式。解析執行...

Hive知識之Hive基礎

5 hive的基本操作 5.2 資料表相關操作 6 資料的匯入和匯出 1 hive簡介 2 hive與傳統關係型資料庫的比較專案hive rdbms 查詢語言 hqlsql 資料儲存 hdfs raw device or local fs 執行mapreduce excuter 執行延遲高低處理...

Hive中SQL的優化技巧

hive中sql的優化技巧，核心思想是避免資料傾斜。1 避免在同乙個查詢中同時出現count,distinct,group by 2 left join 時把小資料量的表放在前面 3 盡量使用子查詢引數配置 set mapred.reduce.tasks 50 set mapreduce.redu...

Hive知識之優化技巧

Hive小技巧及優化

Hive知識之Hive基礎

Hive中SQL的優化技巧

相關推薦