取上下行資料分析函式lag 和lead

2021-06-29 11:02:33 字數 3339 閱讀 3294

【語法】

lag(expr,,)

lead(expr,,)

【功能】表示根據col1分組,在分組內部根據 col2排序,而這個值就表示每組內部排序後的順序編號(組內連續的唯一的)

lead () 下乙個值 lag() 上乙個值

【引數】

expr是從其他行返回的表示式

offset是預設為1 的正數,表示相對行數。希望檢索的當前行分割槽的偏移量

default是在offset表示的數目超出了分組的範圍時返回的值。

【說明】oracle分析函式

【示例】

-- create table

create table lead_table

(caseid varchar2(10),

stepid varchar2(10),

actiondate date

)tablespace colm_data

pctfree 10

initrans 1

maxtrans 255

storage

(initial 64k

minextents 1

maxextents unlimited

);insert into lead_table values('case1','step1',to_date('20070101','yyyy-mm-dd'));

insert into lead_table values('case1','step2',to_date('20070102','yyyy-mm-dd'));

insert into lead_table values('case1','step3',to_date('20070103','yyyy-mm-dd'));

insert into lead_table values('case1','step4',to_date('20070104','yyyy-mm-dd'));

insert into lead_table values('case1','step5',to_date('20070105','yyyy-mm-dd'));

insert into lead_table values('case1','step4',to_date('20070106','yyyy-mm-dd'));

insert into lead_table values('case1','step6',to_date('20070101','yyyy-mm-dd'));

insert into lead_table values('case1','step1',to_date('20070201','yyyy-mm-dd'));

insert into lead_table values('case2','step2',to_date('20070202','yyyy-mm-dd'));

insert into lead_table values('case2','step3',to_date('20070203','yyyy-mm-dd'));

commit;

結果如下:

case1 step1 2007-1-1 step2 2007-1-2

case1 step2 2007-1-2 step3 2007-1-3 step1 2007-1-1

case1 step3 2007-1-3 step4 2007-1-4 step2 2007-1-2

case1 step4 2007-1-4 step5 2007-1-5 step3 2007-1-3

case1 step5 2007-1-5 step4 2007-1-6 step4 2007-1-4

case1 step4 2007-1-6 step6 2007-1-7 step5 2007-1-5

case1 step6 2007-1-7 step4 2007-1-6

case2 step1 2007-2-1 step2 2007-2-2

case2 step2 2007-2-2 step3 2007-2-3 step1 2007-2-1

case2 step3 2007-2-3 step2 2007-2-2

還可以進一步統計一下兩者的相差天數

select caseid,stepid,actiondate,nextactiondate,nextactiondate-actiondate datebetween from (

select caseid,stepid,actiondate,lead(stepid) over (partition by caseid order by actiondate) nextstepid,

lead(actiondate) over (partition by caseid order by actiondate) nextactiondate,

lag(stepid) over (partition by caseid order by actiondate) prestepid,

lag(actiondate) over (partition by caseid order by actiondate) preactiondate

from lead_table)

結果如下:

case1 step1 2007-1-1 2007-1-2 1

case1 step2 2007-1-2 2007-1-3 1

case1 step3 2007-1-3 2007-1-4 1

case1 step4 2007-1-4 2007-1-5 1

case1 step5 2007-1-5 2007-1-6 1

case1 step4 2007-1-6 2007-1-7 1

case1 step6 2007-1-7

case2 step1 2007-2-1 2007-2-2 1

case2 step2 2007-2-2 2007-2-3 1

case2 step3 2007-2-3

每一條記錄都能連線到上/下一行的內容

select caseid,stepid,actiondate,lead(stepid) over (partition by caseid order by actiondate) nextstepid,

lead(actiondate) over (partition by caseid order by actiondate) nextactiondate,

lag(stepid) over (partition by caseid order by actiondate) prestepid,

lag(actiondate) over (partition by caseid order by actiondate) preactiondate

from lead_table

《利用python進行資料分析》之函式應用和對映

coding utf 8 created on tue nov 13 19 50 54 2018 author muli from pandas import series,dataframe import pandas as pd import numpy as np 函式應用和對映 numpy的...

組合pig和hive來進行資料分析

接到產品乙個任務,需要對使用產品的使用者作業系統來個分析。對清洗後的hive資料進行分析,發現,資料恰恰把作業系統資料 進行了過濾,只有到rawlog裡去進行資料分析了。但由於rawlog沒有和資料庫進行關聯,就只有先對rawlog進行初步手工清洗,不清洗掉作業系統資料,再建hive臨時表來解決了。...

利用Python進行資料分析時常用的函式集錦

numpy.where函式是三元表示式 x if condition else y 的向量化版本。當所含資料過多時,後者計算效率極低,此時使用where函式可以提高 執行效率。下面看乙個例子感受一下 xarr np.array 1.1 1.2 1.3 1.4 1.5 yarr np.array 2....