瀏覽日誌中會出現一些重複的資料,在清洗資料的需要刪除連續出現的頁面
1: 建表
create
table temp.
kaka_test
( cuid string,
timestamp string,
pagename string,
elementid string,
eventtype string
)
2:塞入資料insert into table hive. temp.kaka_test
select '1111'
as cuid,
'1572623111'
as timestamp,
'home'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623112
as timestamp,
'goodsdetail'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623113
as timestamp,
'mine'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623114
as timestamp,
'goodsdetail'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623115
as timestamp,
'goodsdetail'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623116
as timestamp,
'goodsdetail'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623117
as timestamp,
'cart'
as pagename,
'pv'
as elementid,
'view'
as eventtype
union all
select '1111'
as cuid,
1572623118
as timestamp,
'home'
as pagename,
'pv'
as elementid,
'view'
as eventtype;
3:題目描述:如下圖所示,goodsdetail頁面連續出現三次,僅保留一條資料
4:排序 rank,lag取出每個頁面的上個頁面
5:過濾掉頁面和上個頁面相同的就可以了
連續出現的數字
表 logs 編寫乙個 sql 查詢,查詢所有至少連續出現三次的數字。返回的結果表中的資料可以按任意順序排列。查詢結果格式如下面的例子所示 解法一 使用視窗函式的偏差函式完美實現。可以這樣理解 將num複製兩列num1和num2,然後num1整體向上移動一行,num2整體向上移動兩行,如下 所以只要...
連續出現的數字
編寫乙個 sql 查詢,查詢所有至少連續出現三次的數字。id num 1 1 2 1 3 1 4 2 5 1 6 2 7 2 例如,給定上面的 logs 表,1 是唯一連續出現至少三次的數字。consecutivenums 1 write your mysql query statement bel...
ACM 連續出現的字元
描述 給定乙個字串,在字串中找到第乙個連續出現k次的字元 輸入 第一行包含乙個正整數k,表示至少需要連續出現的次數。1 k 1000。第二行包含需要查詢的字串。字串的長度在1到1000之間,且不包含任何空白字元。輸出 若存在連續出現至少k次的字元,輸出該字元 否則輸出no。樣例輸入 abcccaaa...