HiveSql整理 , 習題

2021-10-14 15:01:59 字數 4686 閱讀 7375

unix_timestamp:返回當前或指定時間的時間戳

from_unixtime:將時間戳轉為日期格式

current_date:當前日期

current_timestamp:當前的日期加時間

to_date:抽取日期部分

year:獲取年

month:獲取月

day:獲取日

hour:獲取時

minute:獲取分

second:獲取秒

weekofyear:當前時間是一年中的第幾周

dayofmonth:當前時間是乙個月中的第幾天

months_between:兩個日期間的月份

add_months:日期加減月

datediff:兩個日期相差的天數

date_add:日期加天數

date_sub:日期減天數

last_day:日期的當月的最後一天

date_format : 按指定格式返回日期

round: 四捨五入

ceil: 向上取整

floor: 向下

upper: 轉大寫 l

ower: 轉小寫

length: 長度

trim: 前後去空格

lpad: 向左補齊,到指定長度

rpad: 向右補齊,到指定長度

regexp_replace: select regexp_replace(『100-200』, 『(\d+)』, 『num』); 使用正規表示式匹配目標字串,匹配成功後替換!

size: 集合中元素的個數

map_keys: 返回map中的key

map_values: 返回map中的value

array_contains: 判斷array中是否包含某個元素

sort_array: 將array中的元素排序

select 

videoid,

uploader,

views

from gulivideo_orc

order

by views desc

limit

10

select

category_name,

count(*

) c_n

from gulivideo_orc

lateral view explode(category) tmp as category_name --這個備表階段, 新加了字段

group

by category_name

order

by c_n desc

limit

10

select

category_name,

count(*

) c_n

from

(select

videoid,

uploader,

views,

category

from gulivideo_orc

order

by views desc

limit

20)t1

lateral view explode(t1.category) tmp as category_name

group

by category_name

select

category_name,

count(*

) c_n

from

(select

t3.videoid,

t3.category

from

(select

relatedid_name

from

(select

videoid,

uploader,

views,

relatedid

from gulivideo_orc

order

by views desc

limit

50)t1

lateral view explode(t1.relatedid) tmp as relatedid_name

) t2 join gulivideo_orc t3

on t2.relatedid_name=t3.videoid

)t4

lateral view explode(t4.category) tmp as category_name

group

by category_name

order

by c_n desc

select

t1.videoid,

t1.views,

t1.category_name

from

(select

videoid,

views,

category_name

from gulivideo_orc

lateral view explode(category) tmp as category_name

)t1where t1.category_name=

'music'

order

by t1.views desc

limit

10

select

t2.videoid,

t2.views,

t2.category_name,

t2.rk

from

(select

t1.videoid,

t1.views,

t1.category_name,

rank(

)over

(partition

by t1.category_name order

by t1.views desc

) rk

--視窗函式不改變行數, 但是變列數, 並讓結果有序

from

(select

videoid,

views,

category_name

from gulivideo_orc

lateral view explode(category) tmp as category_name

--若按類別分組 , 每個類別 的每條資料拿不到 , 無法得到類別中的前10

--此時想到視窗函式不會改變表中資料的 行數

)t1)t2

where t2.rk<=

10

select

t2.uploader,

t2.views

from

(select

uploader,

videos

from gulivideo_user_orc

order

by videos desc

limit

10)t1 join gulivideo_orc t2

on t1.uploader=t2.uploader

order

by t2.views desc

limit

20

select

t3.uploader,

t3.views,

t3.rk

from

(select

t2.uploader,

t2.views,

rank(

)over

(partition

by t2.uploader order

by t2.views desc

) rk

from

(select

uploader,

videos

from gulivideo_user_orc

order

by videos desc

limit

10)t1 join gulivideo_orc t2

on t1.uploader=t2.uploader

)t3where t3.rk<=

20

select

t2.uploader,

t1.views

from

(select

videoid,

uploader,

views

from gulivideo_orc

order

by views desc

limit

20) t1 join

(select

uploader,

videos

from gulivideo_user_orc

order

by videos desc

limit

10) t2

on t1.uploader=t2.uploader

hive sql優化整理

hive sql優化方法引數一些整理,方便快速查詢使用 1.map數量與reduce數量的控制 輸入檔案大小指實際檔案大小,與檔案格式textfile,orc等無關,壓縮的檔案格式會小很多設定引數要適當調整 map數量控制 set hive.input.format org.apache.hadoo...

遞迴習題整理

1 子集問題 求n個正整數構成的乙個給定集合a 的子集,子集的和要等於乙個給定的正整數d。請輸出所有符合條件的子集。解題思路 1 從原始集合中分離出乙個元素,它有兩種選擇 選擇放入接軌集合,或者不放入結果集合 2 對於剩下的集合,重複1的動作,直到原始集合為空集,證明所有子集已經選取完成了 子集問題...

LeetCode習題整理(一)

將兩個有序鍊錶合併為乙個新的有序鍊錶並返回。新煉表是通過拼接給定的兩個鍊錶的所有節點組成的。我的想法是逐個比較兩個鍊錶各項的大小 模擬過程 l1第一項比較l2第一項相等 執行l2的第一項插入到l1的第二項,此時的l1 1,1,2,4 l2 1,3,4 l1需指向下乙個結點,兩表指向下乙個結點,迴圈過...