Hive基礎操作

檢視、建立、刪除資料庫：

show databases;

create database test;

drop database test;

建立表、檢視表結構：

create table student(id int,name string);

desc student;

清空、刪除、檢視所有表：

truncate table student;

drop table student;

show tables;

執行hive指令碼: hive -f

vim create_table.hql

create database test;

hive -f create_table.hql

執行hive語句:hive -e

hive -e 'show databases'

hive -r 'drop databases'

檢視根目錄下檔案:

dfs -ls /;

建立資料夾:

dfs -mkdir /data;

上傳檔案;

dfs -put /usr/local/data/emp.csv /data/emp

新建資料庫train:

create database train;

use train;

建立emp_in內部表，並修改預設分隔符為 ','

create table emp_in(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal int,

comm int,

deptno int)

row format delimited fields terminated by ',';

匯入資料：

從linux系統匯入:

load data local inpath'/usr/local/data/emp.csv' overwrite into table emp_in;

從hdfs系統匯入:

load data inpath'/data/emp' overwrite into table emp_in;

檢視表資料、刪除表：

select * from emp_in;

drop table emp_in;

建立表並從其他表匯入資料(會觸發mapreduce任務):

create table emp_data as select empno,ename from emp_in;

儲存在hdfs上的表，適合部門間的資料共享：

create external table emp_out(

… …,

… …)

row format delimited fields terminated by ','

location '/data';

注:對應目錄下只能有乙個檔案.

類似mysql的索引功能，資料量很大的時候通過分割槽提高條件查詢效率：

create table emp_part(

… …)

partitioned by(deptno int)

row format delimited fields terminated by ',';

新增資料

insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp_in where deptno=10;

insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp_in where deptno=20;

通過explain檢視hql的執行計畫，對比內部表和分割槽表的查詢效率：

其中data size資料塊掃瞄量是hql查詢優化的參考量

刪除某個分割槽：

alter table emp_part drop partition(deptno=10);

缺點：使用時需要手動定義每個分割槽的值再匯入資料，非常不方便；

可以根據分割槽鍵值的不同自動分割槽

開啟動態分割槽:

set hive.exec.dynamic.partition=true;

設定動態分割槽模式:

set hive.exec.dynamic.partition.mode=nostrict;

還是建立靜態分割槽表:

create table emp_dyn(

… …)

partitioned by(deptno int)

row format delimited fields terminated by ',';

匯入資料，指定分割槽鍵，into為追加匯入,overwrite為覆蓋匯入:

insert overwrite/into table emp_dyn partition(deptno) select * from emp_in;

通過對桶物件盡心hash求餘，使資料位置隨機存放，一定程度上避免資料傾斜.

設定環境變數開啟桶表功能:

set hive.enforce.bucketing = true;

建表語句:

create table emp_buk(

empno int,

… …,

… …)

clustered by (empno) into 3 buckets

row format delimited fields terminated by ',';

匯入資料:

insert into table emp_buk select * from emp_in;

hive中的檢視和mysql的檢視概念一致，都是一組資料的邏輯表示，本質上是一條查詢語句的結果集

建立過程不會產生mapreduce任務，可以簡化巢狀查詢等複雜邏輯查詢.

建立檢視，檢視檢視:

create view view_dept as select * from emp_in where deptno=10;

select * from view_dept;

匯出到linux系統，匯出目錄會完全覆蓋之前的內容，所以盡量選擇新的目錄.

insert overwritelocal directory '/usr/local/output'

row format delimited dields terminated by ','

select * from emp_in;

匯出到hdfs系統

insert overwrite directory '/output'

row format delimited dields terminated by ','

select * from emp_in;

匯入匯出資料時兩個系統的命令區別在於是否有'local'

hive基礎操作

hive f 執行指定檔案中的乙個或者多個查詢語句大多數導航按鍵使用的ctrl 字母的命令和bash shell中是相同的例如，ctrl a代表游標移到行首，ctrl b代表游標移到行尾然而類似的元操作 option或者escape鍵就不起作用了例如，ctrl f一次向前移動乙個單詞這樣的...

Hive基礎操作

安裝hive 命令列介面使用cli，使用者可以建立表，檢視表模式以及查詢表等操作。cli 一次使用命令 hive e select from mytable limit 3 hive s 開啟靜默模式去除掉輸出資訊 ok time token 等行。將查詢結果儲存到乙個臨時檔案中 hive s...

Hive基礎測試操作

一 hive測試 1.檢視資料庫 show databases 2.使用某個資料庫，如預設資料庫 user default 3.建立表 create table if not exist itstar id int,name string 4.插入資料 insert into table itsta...

Hive基礎操作

hive基礎操作

Hive基礎操作

Hive基礎測試操作

相關推薦