HQL實現Hive的WordCount例項

hive>
create
database wordcount;
oktime taken: 2.313 seconds
hive>
show
databases;ok
default
wordcount
time taken: 0.926 seconds, fetched: 2
row(s)

官方的tutorial給出乙個建表的範例，更多細節可以檢視：hive tutorial

create table page_view(viewtime int , userid bigint , page_url string, referrer_url string, ip string comment 'ip address of the user' )row format delimited fields terminated by'1' comment 'this is the page view table' partitioned by (dt string, country string)

stored as sequencefile;

引數說明

create table

建立乙個指定名字的表，如果表已經存在，則會丟擲異常，可以加上if not exists來忽略異常。

row format delimited

表在載入資料時，進行特定格式的切分。

fields terminated by 『\t』表示在載入資料時以『\t』作為行分隔符；

collection items terminated by 『\n』表示列與列之間的分隔符，通常情況下不需要寫。

comment

注釋方式。

stored as

資料檔案在hdfs上存放的格式。

按照範例建立word表：

hive>
create
table
ifnot
exists word(context string)
>
comment
'word table'
>
row format delimited fields
terminated
by'/t'
> stored as textfile;
oktime taken: 2.309 seconds
hive>
show
tables;ok
word
time taken: 0.088 seconds, fetched: 1
row(s)

(base)
[root@dw1 test]
# pwd
/usr/local/test
(base)
[root@dw1 test]
# vi wordcount.txt

準備了乙個英語繞口令，wordcount.txt 文字內容如下：

if one doctor doctors another doctor does the doctor who doctors the doctor doctor the doctor the way the doctor he is doctoring doctors? or does the doctor doctor

the way the doctor who doctors doctors?

將文字資料載入到word表中：

hive> load data local inpath '/usr/local/test/wordcount.txt' overwrite into table word; loading data totable default .word oktime taken: 2.644 seconds # 檢視word表中的內容 hive> select *from word; okone doctor doctors another doctor does the doctor who doctors the doctor doctor the doctor the way the doctor he is doctoring doctors? or does the doctor doctor the way the doctor who doctors doctors? time taken: 3.175 seconds, fetched: 6

row(s)

再建立乙個表wordcount，作為我們存放我們統計出來的結果：

hive>
create
table
ifnot
exists wordcount(word string);ok
time taken: 0.167 seconds

利用split()函式對word表內的單詞進行按空格分割，再插入到wordcount表中：

# 計算框架用的是mapreduce，感覺簡單的乙個任務，跑起來也很慢 hive> insert into table wordcount select explode(split(context, " ") )from word;.. .loading data totable default .wordcount mapreduce jobs launched: stage-stage- 1: map: 1 reduce: 1 cumulative cpu: 4.29 sec hdfs read: 12923 hdfs write: 506 success total mapreduce cpu time spent: 4 seconds 290 msec oktime taken: 107.251 seconds # 資料有點長，我把一些輸出省略了 hive> select *from wordcount;ok. ..theway thedoctor whodoctors doctors? time taken: 0.271 seconds, fetched: 40

row(s)

最後用count()函式統計每個單詞出現的次數：

hive> select word, count (word) from wordcount group by word;.. .mapreduce jobs launched: stage-stage- 1: map: 1 reduce: 1 cumulative cpu: 3.37 sec hdfs read: 12964 hdfs write: 373 success total mapreduce cpu time spent: 3 seconds 370 msec ok 5or1 another 1 doctor 10 doctoring 1 doctors 3 doctors? 2 does 2 he 1is1 one 1 the 8 way 2 who 2 time taken: 70.926 seconds, fetched: 14

row(s)

參考資料

hive實現wordcount的統計

HQL實現Hive的WordCount例項

HIVE的常用操作（HQL 語句

Hive中HQL的資料型別

php連線hive執行HQL查詢

HQL實現Hive的WordCount例項

HIVE的常用操作（HQL 語句

Hive中HQL的資料型別

php連線hive執行HQL查詢

相關推薦