hive基礎入門

現在還沒有上專案，就將自己以前自學大資料裡關於hive方面基礎的東西整理拉下，也算是對學過的知識的一種複習，順便分享出來與各位大佬共勉，有問題的話請不吝賜教。

建立分割槽表

create external table if not exists log(
empno int,
ename string,
job string,
deptno int
)partitioned by (month string)
row format delimited
fields terminated by '\t';

load data local inpath '/home/hadoop/data/log' into table log partition(month='201509');

select * from log where month = '201509';

分桶表

create table bucked_user(
id int,
name string
)clustered by(id) into 4 buckets
row format delimited fields terminated by '\t'
stored as textfile;

另外乙個問題就是使用桶表的時候要開啟桶表; set hive.enforce.bucketing = true;

order by : 全域性排序，乙個reduce

sort by :每個reduce內部進行排序，全域性不排序

distribute by:類似mapreduce中的partition進行分割槽，結合sort by使用

cluster by ：當distribute和sort欄位相同時，使用此方式。注意事項：distribute by必須要在sort後面

資料匯入

load data[local] inpath 'filepath' [overwrite] into 
table tablename [partition (partcol1 = val1,partcol2 = val2)]

否則是hdfs的路徑 [overwrite]：對錶的資料是否覆蓋

create table default.cli like emp; insert into table default.cli

select * from default.emp;

資料匯出

insert overwrite [local] directory  '/user/data'
select * from default.emp;

select[all|distinct] select_expr,select_expr,...
from table_reference
[where where_condition]
[group by col_list]
[cluster by col_list | distribute by col_list] [sort by col_list]]
[limit number]

public class tolowercase extends udf
}

注意事項： udf 必須有返回值型別，可以返回 null, 但是返回值型別不能為 void;

如何使用： add jar 'filepath' create temporary function my_low as '類路徑'

Hive 基礎入門

官方定義 the apache hive data warehouse software facilitates reading,writing,and managing large datasets residing in distributed storage using sql.即 hive是...

Hive 學習記錄入門基礎

一常用ddl 建表 create external table if not exists table name col name data type comment col comment comment table comment partitioned by col name data ty...

Hive使用入門

先介紹一些基本的命令 1 進入hive命令列，這種方式進入之後，操作結果展示時帶有執行mapreduce的除錯資訊 hive service cli 等同於直接輸入hive 2 進入hive命令列靜默模式,不輸出除錯資訊 hive s 靜默模式 3 檢視表資訊，檢視函式資訊，檢視表字段資訊,每一條h...

hive基礎入門

Hive 基礎入門

Hive 學習記錄 入門基礎

Hive使用入門

相關推薦

Hive 學習記錄入門基礎