分組Top N 問題

2021-08-21 18:43:08 字數 2701 閱讀 8348

今天面試,面試官給了這樣乙個場景:

有兩張表,一張表存放車隊id,班組id,司機id

另一種表存放司機id,運營時間,運營里程

要查詢出7月份每個車隊每個班組裡的 top 3

這就要用到row_number()函式

首先按需求建兩張表

create

table demo_of_topn_car

(companyid varchar(8),

classid varchar(8),

driverid varchar(8)

)create

table demo_of_topn_operating

(driverid varchar(8),

operadate datetime,

mileage decimal(5,2)

)

第一步思路 先把每個車隊每個班組每個司機的運營總和計算出來

select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from

(select companyid,classid from demo_of_topn_car group

by companyid,classid

) car

left

join

(select a.companyid,a.classid,a.driverid,b.mileage

from demo_of_topn_car a

left

join demo_of_topn_operating b on a.driverid = b.driverid

) operat on car.companyid = operat.companyid and car.classid = operat.classid

group

by car.companyid,car.classid,operat.driverid

然後對結果集進行分組排序

select companyid,classid,driverid,row_number() over (partition by companyid,classid order

by mileage desc) rid from (

select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from

(select companyid,classid from demo_of_topn_car group

by companyid,classid

) car

left

join

(select a.companyid,a.classid,a.driverid,b.mileage

from demo_of_topn_car a

left

join demo_of_topn_operating b on a.driverid = b.driverid

) operat on car.companyid = operat.companyid and car.classid = operat.classid

group

by car.companyid,car.classid,operat.driverid

order

by mileage desc

) t1

切記:求 top n 都是倒序

最後需要在外面巢狀一層,或者用with

with res as (

select companyid,classid,driverid,row_number() over (partition by companyid,classid order

by mileage desc) rid from (

select car.companyid,car.classid,operat.driverid,sum(mileage) mileage from

(select companyid,classid from demo_of_topn_car group

by companyid,classid

) car

left

join

(select a.companyid,a.classid,a.driverid,b.mileage

from demo_of_topn_car a

left

join demo_of_topn_operating b on a.driverid = b.driverid

) operat on car.companyid = operat.companyid and car.classid = operat.classid

group

by car.companyid,car.classid,operat.driverid

order

by mileage desc

) t1 )

select * from res where rid <= 3

Spark實現分組TopN

在許多資料中,都存在類別的資料,在一些功能中需要根據類別分別獲取前幾或後幾的資料,用於資料視覺化或異常資料預警。在這種情況下,實現分組topn就顯得非常重要了,因此,使用了spark聚合函式和排序演算法實現了分布式topn計算功能。計算分組topn 9 created by administrato...

hive 分組排序,topN

hive 分組排序,topn 語法格式 row number over partition by col1 order by col2 desc rank partition by 類似hive的建表,分割槽的意思 order by 排序,預設是公升序,加desc降序 rank 表示別名 表示根據c...

分組Top N問題 三 sql及Hive實現

同上篇hadoop mapreduce 實現分組top n介紹一樣,這次學習hive實現分組top n。在資料處理中,經常會碰到這樣乙個場景,對錶資料按照某一字段分組,然後找出各自組內最大的幾條記錄情形。針對這種分組top n問題,我們利用hive mapreduce等多種工具實現一下。對類如下us...