Hive 需求 求出連續三天有銷售記錄的店鋪

2021-09-09 07:47:00 字數 2754 閱讀 7184

a,2017-10-11,300

a,2017-10-12,200

a,2017-10-13,100

a,2017-10-15,100

a,2017-10-16,300

a,2017-10-17,150

a,2017-10-18,340

a,2017-10-19,360

b,2017-10-11,400

b,2017-10-12,200

b,2017-10-15,600

c,2017-10-11,350

c,2017-10-13,250

c,2017-10-14,300

c,2017-10-15,400

c,2017-10-16,200

d,2017-10-13,500

e,2017-10-14,600

e,2017-10-15,500

d,2017-10-14,600

分析 : 給每個使用者乙個編號,用日期減去編號,如果是同一天,那麼就是連續的.

a,2017-10-11,300,1,2017-10-10

a,2017-10-12,200,2,2017-10-10

a,2017-10-13,100,3,2017-10-10

a,2017-10-15,100,4,2017-10-11

a,2017-10-16,300,5,2017-10-11

a,2017-10-17,150,6,2017-10-11

a,2017-10-18,340,7,2017-10-11

a,2017-10-19,360,8,2017-10-11

b,2017-10-11,400

b,2017-10-12,200

b,2017-10-15,600

c,2017-10-11,350

c,2017-10-13,250

c,2017-10-14,300

c,2017-10-15,400

c,2017-10-16,200

d,2017-10-13,500

e,2017-10-14,600

e,2017-10-15,500

d,2017-10-14,600

1:建表,載入資料

create table t_jd(shopid string,dt string,sale int)

row format delimited fields terminated by ',';

load data local inpath '/root/sale.dat' into table t_jd;

2:打編號

select shopid,dt,sale,

row_number() over(partition by shopid order by dt) as rn 

from t_jd;

結果 : 

3 根據編號,生成連續日期

select shopid,dt,sale,rn,

date_sub(to_date(dt),rn) 

from

(select shopid,dt,sale,

row_number() over(partition by shopid order by dt) as rn 

from t_jd) tmp;

結果 :

4 分組,求count

select shopid,count(1) as cnt 

from

(select shopid,dt,sale,rn,

date_sub(to_date(dt),rn) as flag

from

(select shopid,dt,sale,

row_number() over(partition by shopid order by dt) as rn 

from t_jd) tmp) tmp2

group by shopid,flag;

結果 :

5 篩選出連續天數大於等於3的

select shopid from

(select shopid,count(1) as cnt 

from

(select shopid,dt,sale,rn,

date_sub(to_date(dt),rn) as flag

from

(select shopid,dt,sale,

row_number() over(partition by shopid order by dt) as rn 

from t_jd) tmp) tmp2

group by shopid,flag) t***

where t***.cnt>=3;

結果 :

6 去重

select distinct shopid from

(select shopid,count(1) as cnt 

from

(select shopid,dt,sale,rn,

date_sub(to_date(dt),rn) as flag

from

(select shopid,dt,sale,

row_number() over(partition by shopid order by dt) as rn 

from t_jd) tmp) tmp2

group by shopid,flag) t***

where t***.cnt>=3;

結果 :

多組資料要求求出最大平台的長度

已知乙個已經從小到大排列好的陣列,說這個陣列中的乙個平台 plateau 就是連續的一串值相同的元素,並且這一串元素不能再延 伸。例如,在 1,2,2,3,3,3,4,5,5,6 中 1,2.2,3.3.3,4,5.5,6 都是平台。試編寫乙個程式,接收乙個陣列,把這個陣列中最長的 平台找出來。在上...

從需求出發 差異化讓雲計算走得更遠

從需求出發 差異化讓雲計算走得更遠 最近,友友ceo姚巨集宇談到如何選擇雲計算產品時表示,在選擇雲計算之前,第乙個看你是不是需要,是不是需要自己蓋棟樓的。選擇的前提是看自己的需求。第二假如自己確實有這個必要,就要看看自己的實力,通常很大的公司很大的企業,並且自己營銷力量很強,像這種企業可能會選擇自己...

hive的日期處理函式及常用需求

1.只有日期 hive default select current date ok c0 2019 12 19 time taken 0.059 seconds,fetched 1 row s 2.含時間 hive default select current timestamp ok c0 20...