Hive中的四種表型別

內部表，就是一般的表，前面講到的表都是內布標，當表定義被刪除的時候，表中的資料隨之一併被刪除。

外部表，資料存在與否和表的定義互不約束，僅僅只是表對hdfs上相應檔案的乙個引用，當刪除表定義的時候，表中的資料依然存在。

建立外部表，external是外部表的關鍵字，也是和內部表有區別的地方 create external table tblname(colname coltype...); 載入資料

alter table tblname set location 'hdfs_absolute_uri';

外部表還可以在建立表的時候指定資料的位置，引用當前位置的資料。

create external table tblname(colname coltype...) location 'hdfs_absolute_uri';

內部表和外部表的轉換：

內——>外
alter table tblname set tblproperties('external'='true');
外——>內
alter table tblname set tblproperties('external'='false');

分割槽表

如何建立一張分割槽表？只需要在之前的建立表後面使用partition by加上分割槽欄位就可以了，

create table tblname (
id int comment 'id',
name string comment 'name' 
) partitioned by (dt date comment 'create time')
row format delimited fields terminated by '\t';

如何載入資料？

load data local inpath linux_fs_path into table tblname partition(dt='2018-12-07');

分割槽的一些操作：

查詢分割槽中的資料：select * from tblname where dt='2018-12-07';(分割槽相當於where的乙個條件)
手動建立乙個分割槽：alter table tblname add partition(dt='2018-12-07');
檢視分割槽表有哪些分割槽：show partitions tblname;
刪除乙個分割槽(資料一起刪掉了)：alter table tblname drop partition(dt='2018-12-07');

多個分割槽如何建立？和單個分割槽類似

create table tblname ( id int comment 'id', name string comment 'name' ) partitioned by (year int comment 'admission year', school string comment 'school name')

row format delimited fields terminated by '\t';

同時也可以從hdfs上引用資料：

alter table tblname partition(year='2018', school='crxy') set location hdfs_uri;

注意：

必須得現有分割槽,必須要使用hdfs絕對路徑

桶表，桶表是對資料進行雜湊取值，然後放到不同檔案中儲存。檢視每個桶檔案中的內容，可以看出是通過對 buckets 取模確定的。如何建立桶表？

create table tblname_bucket(id int) clustered by (id) into 3 buckets;

說明：

clustered by ：按照什麼分桶

into x buckets:分成x個桶

如何載入資料？

不能使用load data這種方式，需要從別的表來引用

insert into table tblname_bucket select * from tbl_other;

注意:在插入資料之前需要先設定開啟桶操作，不然插入資料不會設定為桶!

set hive.enforce.bucketing=true;

桶表的主要作用：

資料抽樣

提高某些查詢效率

注意：需要特別注意的是：clustered by 和 sorted by 不會影響資料的匯入，這意味著，使用者必須自己負責資料如何匯入，包括資料的分桶和排序。

'set hive.enforce.bucketing = true'可以自動控制上一輪 reduce 的數量從而適配 bucket 的個數，

當然，使用者也可以自主設定 mapred.reduce.tasks 去適配bucket 個數，

推薦使用'set hive.enforce.bucketing = true'。

Hive 中的四種排序

1 order by 可以指定desc 降序 asc 公升序 order by會對輸入做全域性排序，因此只有乙個reducer 多個reducer無法保證全域性有序然而只有乙個reducer，會導致當輸入規模較大時，消耗較長的計算時間。create table temperature year i...

Hive 中的四種排序舉例

hive中四種排序的區別

hive中有四種排序，分別是 order by,sort by,distribute by 重點 cluster by order by 全域性排序，但是只能有乙個reduce來處理，在嚴格模式下必須指定limit，否則會報錯，在資料量很大的時候，處理時間會很長甚至跑不出資料，慎用！sort by ...

Hive中的四種表型別

Hive 中的四種排序

Hive 中的四種排序 舉例

hive中四種排序的區別

相關推薦

Hive 中的四種排序舉例