今天面試,面試官給了這樣乙個場景:
有兩張表,一張表存放車隊id,班組id,司機id
另一種表存放司機id,運營時間,運營里程
要查詢出7月份每個車隊每個班組裡的 top 3
這就要用到row_number()函式
首先按需求建兩張表
create
table demo_of_topn_car
(companyid varchar(8),
classid varchar(8),
driverid varchar(8)
)create
table demo_of_topn_operating
(driverid varchar(8),
operadate datetime,
mileage decimal(5,2)
)
第一步思路 先把每個車隊每個班組每個司機的運營總和計算出來
select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from
(select companyid,classid from demo_of_topn_car group
by companyid,classid
) car
left
join
(select a.companyid,a.classid,a.driverid,b.mileage
from demo_of_topn_car a
left
join demo_of_topn_operating b on a.driverid = b.driverid
) operat on car.companyid = operat.companyid and car.classid = operat.classid
group
by car.companyid,car.classid,operat.driverid
然後對結果集進行分組排序
select companyid,classid,driverid,row_number() over (partition by companyid,classid order
by mileage desc) rid from (
select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from
(select companyid,classid from demo_of_topn_car group
by companyid,classid
) car
left
join
(select a.companyid,a.classid,a.driverid,b.mileage
from demo_of_topn_car a
left
join demo_of_topn_operating b on a.driverid = b.driverid
) operat on car.companyid = operat.companyid and car.classid = operat.classid
group
by car.companyid,car.classid,operat.driverid
order
by mileage desc
) t1
切記:求 top n 都是倒序
最後需要在外面巢狀一層,或者用with
with res as (
select companyid,classid,driverid,row_number() over (partition by companyid,classid order
by mileage desc) rid from (
select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from
(select companyid,classid from demo_of_topn_car group
by companyid,classid
) car
left
join
(select a.companyid,a.classid,a.driverid,b.mileage
from demo_of_topn_car a
left
join demo_of_topn_operating b on a.driverid = b.driverid
) operat on car.companyid = operat.companyid and car.classid = operat.classid
group
by car.companyid,car.classid,operat.driverid
order
by mileage desc
) t1 )
select * from res where rid <= 3
Spark實現分組TopN
在許多資料中,都存在類別的資料,在一些功能中需要根據類別分別獲取前幾或後幾的資料,用於資料視覺化或異常資料預警。在這種情況下,實現分組topn就顯得非常重要了,因此,使用了spark聚合函式和排序演算法實現了分布式topn計算功能。計算分組topn 9 created by administrato...
hive 分組排序,topN
hive 分組排序,topn 語法格式 row number over partition by col1 order by col2 desc rank partition by 類似hive的建表,分割槽的意思 order by 排序,預設是公升序,加desc降序 rank 表示別名 表示根據c...
分組Top N問題 三 sql及Hive實現
同上篇hadoop mapreduce 實現分組top n介紹一樣,這次學習hive實現分組top n。在資料處理中,經常會碰到這樣乙個場景,對錶資料按照某一字段分組,然後找出各自組內最大的幾條記錄情形。針對這種分組top n問題,我們利用hive mapreduce等多種工具實現一下。對類如下us...