資料
record_time 通話時間
imei 基站編號
cell 手機編號
drop_num 掉話秒數
duration 通話持續總秒數
2011-07-13 00:00:00+08,356966,29448-37062,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,352024,29448-51331,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,353736,29448-51331,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,353736,29448-51333,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,351545,29448-51333,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,353736,29448-51343,1,0,0,8,0,g,0
2011-07-13 00:00:00+08,359681,29448-51462,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,354707,29448-51462,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,356137,29448-51470,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,352739,29448-51971,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,354154,29448-51971,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,127580,29448-51971,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,354264,29448-51973,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,354733,29448-51973,1,0,0,36,0,g,0
2011-07-13 00:00:00+08,356807,29448-51973,0,0,0,0,0,g,0
2011-07-13 00:00:00+08,125470,29448-51973,1,0,0,13,0,g,0
2011-07-13 00:00:00+08,353530,29448-52061,1,0,0,46,0,g,0
2011-07-13 00:00:00+08,352417,29448-5231,1,0,0,2,0,g,0
原始表create table cell_monitor(
record_time string,
imei string,
ph_num int,
call_num int,
drop_num int,
duration int,
drop_rate double,
net_type string,
erl string
)row format delimited fields terminated by ','
stored as textfile;
建立結果表create table cell_drop_monitor(
imei string,
total_call_num int,
total_drop_num int,
d_rate double
)row format delimited fields terminated by '\t'
stored as textfile;
插入原始資料load data local inpath '/test/cdr_summ_imei_cell_info.csv' into table cell_monitor;
統計sql語句from cell_monitor cm
insert overwrite table cell_drop_monitor
select cm.imei,sum(cm.drop_num),sum(cm.duration),sum(cm.drop_num)/sum(cm.duration) d_rate
group by cm.imei
sort by d_rate desc;
取別名
選擇基站編號 求和 掉話秒數 求和通話時間 比較平均**率 取別名 d_rate
分組為cm.imei
sort by d_rate desc; 倒序排序
建表
create table docs(line string);
載入資料到表裡load data local inpath '/test/wc.txt' into table docs;
按照空格切割查詢,形成陣列select split(line,' ') from docs;
執行hive> select split(line,' ') from docs;
ok["from","cell_monitor","cm","",""]
["insert","overwrite","table","cell_drop_monitor"]
[""]
["from","cell_monitor","cm","",""]
["insert",""]
explode(array) 陣列一條記錄有多個引數,將引數拆分,每個引數生成一列hive> select explode(split(line,' '))from docs;
okfrom
cell_monitor
cminsert
overwrite
table
cell_drop_monitor
from
cell_monitor
cm
建立結果表create table wc(word string,totalword int);
統計sql語句from (select explode(split(line,' ')) as word from docs) w insert into table wc
select word, count(1) as totalword
group by word
order by word;
結果hive> select * from wc;
ok 6
cell_drop_monitor 1
cell_monitor 2
cm 2
from 2
insert 2
overwrite 1
table 1
time taken: 0.18 seconds, fetched: 8 row(s)
Hive基礎 案例
h ive shell 檢視所有資料庫 show databases 建立資料庫 create database database name 切換資料庫 use database name 檢視所有表 show tables 模糊查詢表 show tables like name 檢視所有的hive...
hive 行列轉換案例
0 stu表資料 stu id name hello,you zm2008 hello,me zm2015 1 實現單詞計數 列轉行 split切分 explode 炸開 1.0 資料拆分成陣列 select split id,from stu 得到陣列 hello,you hello,me 1.1...
hive案例調優
無效id在關聯時的資料傾斜問題 問題 日誌中常會出現資訊丟失,比如每日約為 20 億的全網日誌,其中的 user id 為主 鍵,在日誌收集過程中會丟失,出現主鍵為 null 的情況,如果取其中的 user id 和 bmw users 關聯,就會碰到資料傾斜的問題。原因是 hive 中,主鍵為 n...