hive中grouping sets的使用

關於grouping sets的使用，我是參考這篇部落格學習的，感覺講的很清楚

**鏈結

通俗的說，grouping sets是一種將多個group by 邏輯寫在乙個sql語句中的便利寫法。

select
a,b,
c,group_id, 
count(a)
from
tablename
group by  --declare columns
a,b,
cgrouping sets
(   (a,c),
(a,b),
(b,c),
(c))

其中grouping sets中的(a,c), (a,b), (b,c), (c) 代表4個group by 組合，相當於寫了四個sql查詢語句使用了四個不同的group by策略。

group_id是為了區分每條輸出結果是屬於哪乙個group by的資料。它是根據group by後面宣告的順序字段是否存在於當前group by中的乙個二進位制位組合資料。比如（a,c）的group_id： group_id(a,c) = grouping(a)+grouping(b)+grouping (c) 的結果就是：二進位制：101 也就是5.

select中的字段是完整的a,b,c，但是我們知道由於group by的存在，select 欄位本不應該出現非group by欄位的，所以這裡我們要特別說明，如果直譯器發現group by a,c 但是select a,b,c 那麼執行時會將所有from 表取出的結果複製乙份，b都置為null，也就是在結果中，b都為null。

1、對所有明細資料按年份、月份、年月，三者分別彙總，獲取其彙總結果

select year,month,sum(cost) as all_cost,grouping__id from

(select year(orderdatetime) as year,

month(orderdatetime) as month,

cost

from analysis_daily_expense

) middle

group by year,month

grouping sets (year,month,(year,month))

order by grouping__id;

執行結果如下：

hive中grouping sets的使用

從HIVE中中查詢

Hive 中的日誌

Hive 中的日誌

hive中grouping sets的使用

從HIVE中中查詢

Hive 中的日誌

Hive 中的日誌

相關推薦