Hive中的表及其建立

內部表也稱之為managed_table；預設儲存在/user/hive/warehouse下，也可以通過location指定；刪除表時，會刪除表資料以及元資料；

create table if not exists t_user(

id int,

name string,

*** boolean,

age int,

salary double,

hobbies array,

card map,

address structcountry:string,city:string

)row format delimited

fields terminated by 『,』

collection items terminated by 『|』

map keys terminated by 『>』

lines terminated by 『\n』

stored as textfile;

外部表稱之為external_table；在建立表時可以自己指定目錄位置(location)；刪除表時，只會刪除元資料不會刪除表資料；

create external table if not exists t_user1(

id int,

name string,

*** boolean,

age int,

salary double,

hobbies array,

card map,

address structcountry:string,city:string

)row format delimited

fields terminated by 『,』

collection items terminated by 『|』

map keys terminated by 『>』

lines terminated by 『\n』

stored as textfile

location 『hdfs:///hive_db/t_user』;

hive中的表對應為hdfs上的指定目錄，在查詢資料時候，缺省會對全表進行掃瞄，這樣時間和效能的消耗都非常大。分割槽為hdfs上表目錄的子目錄，資料按照分割槽儲存在子目錄中。如果查詢的where子句的中包含分割槽條件，則直接從該分割槽去查詢，而不是掃瞄整個表目錄，合理的分割槽設計可以極大提高查詢速度和效能。這裡說明一下分割槽表並非hive獨有的概念，實際上這個概念非常常見。比如在我們常用的oracle資料庫中，當表中的資料量不斷增大，查詢資料的速度就會下降，這時也可以對錶進行分割槽。表進行分割槽後，邏輯上表仍然是一張完整的表，只是將表中的資料存放到多個表空間（物理檔案上），這樣查詢資料時，就不必要每次都掃瞄整張表，從而提公升查詢效能。在hive中可以使用partitioned by子句建立分割槽表。表可以包含乙個或多個分割槽列，程式會為分割槽列中的每個不同值組合建立單獨的資料目錄。

create external table t_employee(
id int,
name string,
job string,
manager int,
hiredate timestamp,
salary decimal(7,2)
)    partitioned by (deptno int)   
row format delimited 
fields terminated by "\t"
location '/hive/t_employee'
;

7369 smith clerk 7902 1980-12-17 00:00:00 800.00 7499 allen salesman 7698 1981-02-20 00:00:00 1600.00 300.00 7521 ward salesman 7698 1981-02-22 00:00:00 1250.00 500.00 7566 jones manager 7839 1981-04-02 00:00:00 2975.00 7654 martin salesman 7698 1981-09-28 00:00:00 1250.00 1400.00 7698 blake manager 7839 1981-05-01 00:00:00 2850.00 7782 clark manager 7839 1981-06-09 00:00:00 2450.00 7788 scott analyst 7566 1987-04-19 00:00:00 1500.00 7839 king president 1981-11-17 00:00:00 5000.00 7844 turner salesman 7698 1981-09-08 00:00:00 1500.00 0.00 7876 adams clerk 7788 1987-05-23 00:00:00 1100.00 7900 james clerk 7698 1981-12-03 00:00:00 950.00 7902 ford analyst 7566 1981-12-03 00:00:00 3000.00

7934 miller clerk 7782 1982-01-23 00:00:00 1300.00

0: jdbc:hive2://centos:10000> load data local inpath '/root/baizhi/t_employee' overwrite into table t_employee partition(deptno=10)
;

分割槽表是為了將檔案按照分割槽資料夾進行粗粒度檔案隔離，但是分桶表是將資料按照某個字段進行hash計算出所屬的桶，然後在對桶內的資料進行排序。

create external table t_employee_bucket(
id int,
name string,
job string,
manager int,
hiredate timestamp,
salary decimal(7,2),
deptno int)
clustered by(id) sorted by(salary asc) into 6 buckets  
row format delimited 
fields terminated by "\t"
location '/hive/employee_bucket'
;

0: jdbc:hive2://centos:10000>
set hive.enforce.bucketing =
true
;no rows affected (0.024 seconds)
0: jdbc:hive2://centos:10000> insert into table t_employee_bucket(id,name,job,manager,hiredate,salary,deptno) select id,name,job,manager,hiredate,salary,deptno  from t_employee where deptno=10;

Hive中的表及其建立

Hive表型別及其操作

hive創標 hive建立表

hive中建立hive json格式的表及查詢

Hive中的表及其建立

Hive表型別及其操作

hive創標 hive建立表

hive中建立hive json格式的表及查詢

相關推薦