Hadoop Hive 庫表基本操作

hive 庫表基本操作

建立資料庫

hive> create database if not exists db1;
hive> create schema if not exists db2;

刪除資料庫

hive> drop database db2;
hive> drop schema db1;

建立表

create table if not exists employee ( eid int, name string, salary string, job string, year int) comment 'employee details' row format delimited fields terminated by '\t' lines terminated by '\n'

stored as textfile;

匯入資料進表

準備資料檔案sample.txt

[root@g12-1 ~]# cat /tmp/sample.txt 
1201	gopal	45000	technicalmanager	2013	
1202	manisha	45000	proofreader	2013
1203	masthanvali	40000	technicalwriter	2014
1204	kiran	40000	hradmin	2014
[root@g12-1 ~]#

匯入資料進表

hive> load data local inpath '/tmp/sample.txt' overwrite into table employee;
loading data to table db1.employee
table db1.employee stats: [numfiles=1, numrows=0, totalsize=150, rawdatasize=0]
oktime taken: 0.354 seconds
hive> select * from employee;
ok1201	gopal	45000	technicalmanager	2013
1202	manisha	45000	proofreader	2013
1203	masthanvali	40000	technicalwriter	2014
1204	kiran	40000	hradmin	2014
time taken: 0.094 seconds, fetched: 4 row(s)
hive>

hiveql

select...where

hive> select * from employee where salary > 40000;

order by

hive> select * from employee order by eid;

group by

hive> select salary,count(salary) from employee group by salary;

select...join

hive> select c.id, c.name, c.age, o.amount from customers c join orders o on (c.id = o.customer_id);

分割槽表hive的資料庫是目錄，表也是目錄，分割槽表表目錄的子目錄

create table xx(...) partitioned by()

alter table *** add partitions() ...

load data local inpath ... into table *** partions (...);

bucket表（桶表）

create table ***(...) ... clustered by (filename) into n buckets;

桶表是資料檔案.hash

hiveql調優

1）explain 解釋執行計畫

explain extended select count(*) from employee;

explain formatted select count(*) from employee;

2）啟用limit調優，避免全表掃瞄，使用抽樣機制

select * from employee limit 1,2；

配置hive.limite.optimize.enable=true

3）join

使用map端鏈結（/*+ streamtable(table) */）

連線查詢表的大小是從左至右一次增長。

4）設定本地模式，在單台機器上處理所有任務

使用小資料情況

hive.exec.mode.local.auto=true //預設false

hive> set hive.exec.mode.local.auto=true;

...

hadoop hive基本操作

hive 基本操作命令 hive shell hive service cli 進入shell後 1.設定顯示當前database和顯示列欄位 set hive.cli.print.current.db true 顯示當前db set hive.cli.print.headers true 顯示列...

鍊錶的基本操

實現鍊錶的增加刪除查詢和反轉 class link public void display class linklist 插入頭節點 public void insertfirst int data 刪除頭結點 public link deletefirst 查詢 public link fin...

Hadoop HIVE 資料表使用

3 使用 3.1 資料匯入 3.1.1 可以使用命令列匯入，也可以直接上傳到hdfs的特定目錄 3.1.2 格式問題 3.1.2.1 缺失不合法字段預設值為null 3.1.2.2 最好資料是格式化的，不要缺失字段 3.1.3 從hdfs其他目錄匯入 3.1.3.1 hadoop fs put o...

Hadoop Hive 庫表基本操作

hadoop hive基本操作

鍊錶的基本操

Hadoop HIVE 資料表 使用

相關推薦

Hadoop HIVE 資料表使用