Hive小練習實現單詞統計

su -l hadoop

#輸入密碼

vi word.txt #新建乙個word.txt文件，作為我們的資料檔案

輸入一些詞彙，以" "為分隔符

hello world

hello terese

hello myfriend

hello everyone

esc

:wq儲存退出

hive#回到hive命令列中

create table text (line string);#建立乙個text表

load data local inpath '/home/hadoop/word.txt' into table text;#將資料載入到該表中

select *from text;#檢視text表

如何將其中的每行的單詞進行統計呢？

先將每行文字切割成單個單詞，使用split函式，得到單個單詞為元素的陣列，使用explode函式將陣列中的每個元素生成一行，最後得到hive能直接通過group by處理的形式。

使用split函式將每行的文字切割成單個的單詞。

使用explode這個函式的功能是行轉列，將得到的陣列中的每個元素生成一行。

select explode(split(line,' '))as word from text;

select w.word,count(*) from (select explode(split(line,' '))as word from text) as w group by w.word;

#需要使用group by對資料進行統計。

select w.word,count(*) c from (select explode(split(line,' '))as word from text) as w group by w.word order by c desc limit 3;

#降序取前三

create table count as select w.word,count(*) c from (select explode(split(line,' '))as word from text) as w group by w.word order by c desc limit 3;

#將查詢結果存入另一張表中

select * from count; #檢視wordcount表

使用hive做單詞統計
1 首先建立乙個檔案單詞的檔案，例如a.txt kk,123,weiwei,123 hlooe,hadoop,hello,ok h,kk,123,weiwei,ok ok,h 2 將檔案上傳到hdfs中 hdfs dfs copyfromlocal a.txt upload wangwei a.tx...

hive小練習通訊掉話率統計
要求根據所給資料，統計掉話率前十的基站。資料格式如下掉話率掉話率，是移動通訊中的重要指標，也稱通話中斷率，是指在移動通訊的過程中，通訊意外中斷的機率。本例的計算方法掉話率掉話時長通話總時長。資料格式資料條數 976306條。大小 54.7mb。關鍵字段介紹 imei 基站編號 cell...

Hive實現詞頻統計
hive中提供了類似於sql語言的查詢語言 hiveql，可以通過 hiveql語句快速實現簡單的 mapreduce統計，hive 自身可以將 hiveql 語句快速轉換成 mapreduce 任務進行執行，而不必開發專門的 mapreduce 應用程式，因而十分適合資料倉儲的統計分析。通過乙個簡...

Hive小練習實現單詞統計

使用hive做單詞統計

hive小練習 通訊掉話率統計

Hive實現詞頻統計

相關推薦

hive小練習通訊掉話率統計