HIVE SQL解決JOIN OR的問題

hive sql不支援 join on 的條件後跟or，使用union all，並去重插入表

insert into table tablename1 partition (stat_month='$' , stat_date='$' , stat_hour='$')

select

collect_set(欄位名稱)[0],collect_set(欄位名稱)[0],collect_set(欄位名稱)[0],.............collect_set(欄位名稱)[0]

from(

select a.欄位,b.欄位

from tablenamea a

left join tablenameb b on a.欄位名稱 =b.欄位名稱a

where stat_month='$' and and stat_date='$' and..............

union all

select a.欄位,b.欄位

from tablenamea a

left join tablenameb b on a.欄位名稱 =b.欄位名稱b

where stat_month='$' and and stat_date='$' and..............

union all

select a.欄位,b.欄位

from tablenamea a

left join tablenameb b on a.欄位名稱 =b.欄位名稱c

where stat_month='$' and and stat_date='$' and..............

) newtablename group by 去重欄位名稱;

總結：（1）group by 語句中出現的非聚合函式需要跟在group by後面。否則會報錯

（2）可使用collect_set()函式來封裝不需要group by的字段【collect_set(a_id)[0]】可解決（1）的情況

hive sql優化整理

hive sql優化方法引數一些整理，方便快速查詢使用 1.map數量與reduce數量的控制輸入檔案大小指實際檔案大小，與檔案格式textfile,orc等無關，壓縮的檔案格式會小很多設定引數要適當調整 map數量控制 set hive.input.format org.apache.hadoo...

hivesql 效率優化

1.group by 資料傾斜問題 hive是根據group by 的key進行資料分發的，某個key相同的資料太多的會被分發到乙個reducer上，key的資料分布不均勻會導致大量資料被shuffle到某個或者某些reducer上，出現嚴重的資料傾斜，使得資料計算變慢配置任務引數 set hiv...

hive sql 行列轉換

對一張大表的每一行，後面加多種label值其實就是笛卡爾積，舉例 select from dev.dev jiadian user yuge temp cross join select 0 as label union all select 1 as label union all select...

HIVE SQL解決JOIN OR的問題

hive sql優化整理

hivesql 效率優化

hive sql 行列轉換

相關推薦