Hive 乙個視窗函式的問題解決

比如這兒有乙個廣告，有的是廣告位，有的是非廣告位

使用者瀏覽的時候，就會產生乙個排序的資料，我們抽象成下面的乙個表

create table window_test_table ( id int ,--使用者id sq string, --可以標識每個商品 cell_type int ,--標識每個商品的型別，比如廣告，非廣告 rank int --這次搜尋下商品的位置，比如第乙個廣告商品就是1，後面的依次2，3，4... )row format delimited fields terminated by','

;

匯入資料

1,flower,10,1 1,tree,26,3 1,hive,10,4 1,hadoop,13,5 1,spark,26,6 1,flink,14,7 1,sqoop,10,8

load data local inpath '/home/hadoop/data/window' into table window_test_table;

假設26代表廣告，想獲取每個使用者每次瀏覽中，非廣告型別商品的自然排序，如下效果：

1,flower,10,1 1,tree,26,null 1,hive,10,3 1,hadoop,13,4 1,spark,26,null 1,flink,14,5 1,sqoop,10,6

select id,
sq,cell_type,
case
when cell_type =
26then
null
else row_number(
)over
(partition
by id order
by rank)
end rank 
from window_test_table;

結果是：

並沒有排序到

我們檢視sql的執行計畫

stage dependencies: stage-1 is a root stage stage-0 depends on stages: stage-1 stage plans: stage: stage-1 tezedges: reducer 2 dagname: hadoop_20190331200315_a6425b27-68cd-4f04-b67d-d38ae2fc8207:21 vertices: map 1 map operator tree: tablescan alias: window_test_table statistics: num rows: 1 data size: 104 basic stats: complete column stats: none reduce output operator key expressions: id (type: int), rank (type: int) sort order: ++ map-reduce partition columns: id (type: int) statistics: num rows: 1 data size: 104 basic stats: complete column stats: none value expressions: sq (type: string), cell_type (type: int) reducer 2 reduce operator tree: select operator expressions: key.reducesinkkey0 (type: int), value._col0 (type: string), value._col1 (type: int), key.reducesinkkey1 (type: int) outputcolumnnames: _col0, _col1, _col2, _col3 statistics: num rows: 1 data size: 104 basic stats: complete column stats: none ptf operator function definitions: input definition input alias: ptf_0 output shape: _col0: int, _col1: string, _col2: int, _col3: int type: windowing windowing table definition input alias: ptf_1 name: windowingtablefunction order by: _col3 partition by: _col0 raw input shape: window functions: window function definition alias: row_number_window_0 name: row_number window function: genericudafrownumberevaluator window frame: preceding(max)~following(max) ispivotresult: true statistics: num rows: 1 data size: 104 basic stats: complete column stats: none select operator expressions: _col0 (type: int), _col1 (type: string), _col2 (type: int), case when ((_col2 = 26)) then (null) else (row_number_window_0) end (type: int) outputcolumnnames: _col0, _col1, _col2, _col3 statistics: num rows: 1 data size: 104 basic stats: complete column stats: none file output operator compressed: false statistics: num rows: 1 data size: 104 basic stats: complete column stats: none table: input format: org.apache.hadoop.mapred.textinputformat output format: org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat serde: org.apache.hadoop.hive.serde2.lazy.lazy******serde stage: stage-0 fetch operator limit: -1 processor tree: listsink

可以發現，case when 是在視窗之後執行的

改寫成：

select id,
sq,cell_type,
case
when cell_type !=
26then row_number(
)over
(partition
bycase
when cell_type !=
26then id else rand(
)end
order
by rank)
else
null
end nature_rank
from window_test_table;

即可

Hive 乙個視窗函式的問題解決

Jquery parseInt函式問題解決方案

記錄乙個redis安裝報錯問題解決

分享乙個MySQL死鎖問題解決的方法

Hive 乙個視窗函式的問題解決

Jquery parseInt函式問題解決方案

記錄乙個redis安裝報錯問題解決

分享乙個MySQL死鎖問題解決的方法

相關推薦