hive分析的小練習

一、題目：**指標分析案例

1、需求：統計每天24小時每小時的pv和uv數 2、分析： -》pv：count(url) -》uv：count(distinct guid) 3、資料採集 #建庫create database example; #建表（源表） create table log_src( id string, url string, referer string, keyword string, type string, guid string, pageid string, moduleid string, linkid string, attachedinfo string, sessionid string, trackeru string, trackertype string, ip string, trackersrc string, cookie string, ordercode string, tracktime string, enduserid string, firstlink string, sessionviewno string, productid string, curmerchantid string, provinceid string, cityid string, fee string, edmactivity string, edmemail string, edmjobid string, ieversion string, platform string, internalkeyword string, resultsum string, currentpage string, linkposition string, buttonposition string )row format delimited fields terminated by '\t'; #載入資料 load data local inpath '/opt/datas/2015082818' into table log_src; 4、資料清洗 #建立分割槽表 create table log_part( id string, url string, guid string )partitioned by(day string,hour string) row format delimited fields terminated by '\t'; #向log_part表新增待分析指標的資料 insert into table log_part partition(day='28',hour='18') select id,url,guid from log_clean where hour='18'; -- 由於原資料沒有 18 的小時字段，因此要對資料進行清洗 #建立log_clean清洗表 create table log_clean as select id,url, guid, substring(tracktime,9,2) day, substring(tracktime,12,2) hour from log_src; #再次向log_part表載入資料 insert into table log_part partition(day='28',hour='18') select id,url,guid from log_clean where day='28' and hour='18'; 5、資料分析 #需求：統計每天24小時每小時的pv和uv數 #sql select day,hour,count(url) as pv,count(distinct guid) as uv from log_part group by day,hour; #結果day hour pv uv 28 18 64972 23938 #工作中會把最終的計算指標放到乙個表裡 create table log_result as select day, hour, count(url) as pv, count(distinct guid) as uv from log_part group by day,hour; #使用sqoop工具把log_result表的資料匯出到mysql bin/sqoop export \ --connect jdbc:mysql://ai7-server1:3306/sqoop_test \ --username root \ --password 123456 \ --table from_hive \ --export-dir '/user/hive/warehouse/example.db/log_result' \ --input-fields-terminated-by '\t' 6、資料展示

從mysql表載入資料即可

hive分析的小練習

Hive小練習實現單詞統計

資料分析的小練習

hive小練習通訊掉話率統計

hive分析的小練習

Hive小練習實現單詞統計

資料分析的小練習

hive小練習 通訊掉話率統計

相關推薦

hive小練習通訊掉話率統計