hive>
create
database wordcount;
oktime taken: 2.313 seconds
hive>
show
databases;ok
default
wordcount
time taken: 0.926 seconds, fetched: 2
row(s)
官方的tutorial給出乙個建表的範例,更多細節可以檢視:hive tutorial
create
table page_view(viewtime int
, userid bigint
, page_url string, referrer_url string,
ip string comment
'ip address of the user'
)row format delimited
fields
terminated
by'1'
comment
'this is the page view table'
partitioned by
(dt string, country string)
stored as sequencefile;
引數說明
create table
建立乙個指定名字的表,如果表已經存在,則會丟擲異常,可以加上if not exists來忽略異常。
row format delimited
表在載入資料時,進行特定格式的切分。
fields terminated by 『\t』表示在載入資料時以『\t』作為行分隔符;
collection items terminated by 『\n』表示列與列之間的分隔符,通常情況下不需要寫。
comment
注釋方式。
stored as
資料檔案在hdfs上存放的格式。
按照範例建立word表:
hive>
create
table
ifnot
exists word(context string)
>
comment
'word table'
>
row format delimited fields
terminated
by'/t'
> stored as textfile;
oktime taken: 2.309 seconds
hive>
show
tables;ok
word
time taken: 0.088 seconds, fetched: 1
row(s)
(base)
[root@dw1 test]
# pwd
/usr/local/test
(base)
[root@dw1 test]
# vi wordcount.txt
準備了乙個英語繞口令,wordcount.txt 文字內容如下:
if one doctor doctors another doctor
does the doctor who doctors the doctor
doctor the doctor the way the doctor
he is doctoring doctors?
or does the doctor doctor
the way the doctor who doctors doctors?
將文字資料載入到word表中:
hive>
load
data
local inpath '/usr/local/test/wordcount.txt' overwrite into
table word;
loading data
totable
default
.word
oktime taken: 2.644 seconds
# 檢視word表中的內容
hive>
select
*from word;
okone doctor doctors another doctor
does the doctor who doctors the doctor
doctor the doctor the way the doctor
he is doctoring doctors?
or does the doctor doctor
the way the doctor who doctors doctors?
time taken: 3.175 seconds, fetched: 6
row(s)
再建立乙個表wordcount,作為我們存放我們統計出來的結果:
hive>
create
table
ifnot
exists wordcount(word string);ok
time taken: 0.167 seconds
利用split()函式對word表內的單詞進行按空格分割,再插入到wordcount表中:
# 計算框架用的是mapreduce,感覺簡單的乙個任務,跑起來也很慢
hive>
insert
into
table wordcount select explode(split(context,
" ")
)from word;..
.loading data
totable
default
.wordcount
mapreduce jobs launched:
stage-stage-
1: map: 1 reduce: 1 cumulative cpu: 4.29 sec hdfs read: 12923 hdfs write: 506 success
total mapreduce cpu time spent: 4 seconds 290 msec
oktime taken: 107.251 seconds
# 資料有點長,我把一些輸出省略了
hive>
select
*from wordcount;ok.
..theway
thedoctor
whodoctors
doctors?
time taken: 0.271 seconds, fetched: 40
row(s)
最後用count()函式統計每個單詞出現的次數:
hive>
select word,
count
(word)
from wordcount group
by word;..
.mapreduce jobs launched:
stage-stage-
1: map: 1 reduce: 1 cumulative cpu: 3.37 sec hdfs read: 12964 hdfs write: 373 success
total mapreduce cpu time spent: 3 seconds 370 msec
ok 5or1
another 1
doctor 10
doctoring 1
doctors 3
doctors? 2
does 2
he 1is1
one 1
the 8
way 2
who 2
time taken: 70.926 seconds, fetched: 14
row(s)
參考資料
hive實現wordcount的統計
HIVE的常用操作(HQL 語句
hive基本操作命令 建立資料庫 create database db name create database if not exists db name 建立乙個不存在的資料庫final 檢視資料庫 show databases 選擇性檢視資料庫 show databases like f.檢視...
Hive中HQL的資料型別
整型 tinyint smallint int bigint 浮點型 float double 布林 boolean 字串 string 時間戳 timestamp array create table ifnot exists test array id int work add array ro...
php連線hive執行HQL查詢
使用php連線hive的條件 1 需要安裝thrift 安裝步驟 configure without ruby make make install 如果沒有安裝libevent libevent devel的應該先安裝這兩個依賴庫yum y install libevent libevent dev...