hive按年月實現動態分割槽，分桶表建立

目標：按照表中資料建立時間的年月來進行分割槽

靜態分割槽是在語句中指定分割槽欄位為某個固定值，動態分割槽就相對靈活的多。

乙個分割槽實際上就是表下的乙個目錄，乙個表可以在多個維度上進行分割槽，分割槽之間的關係就是目錄樹的關係。

先將mysql表testtable用sqoop匯入到hive中，採用自動建表的方式匯入。(如果你的hive表已存在，這步可以忽略)

然後把建表語句拷出來（show create table testtable）

建立新錶，字段不變修改表名新增年月分割槽字段，如下

create table `testtable1`( `id` bigint, `payorderno` string, `tradeno` string, `tradetype` int, `goodsname` string, `tradetime` bigint, `storename` string, `paymentuser` string, `payway` int, `tradeamount` string, `refundno` string, `servicecharge` string, `remark` string, `bankcode` string, `thirdpayaccountid` string, `creationtime` bigint) partitioned by (year int,month int) row format delimited fields terminated by '\t' stored as parquetfile;

關閉嚴格分割槽模式

動態分割槽模式時是嚴格模式，也就是至少有乙個靜態分割槽。

set hive.exec.dynamic.partition.mode=nonstrict //分割槽模式，預設nostrict set hive.exec.dynamic.partition=true //開啟動態分割槽,預設true set hive.exec.max.dynamic.partitions=1000 //最大動態分割槽數,預設1000

然後根據時間動態分割槽，將資料插入到新錶中，新錶就實現分割槽了

creationtime為時間格式的轉換

insert overwrite table testtable1 partition (year，month) 
select *,year(creationtime) as year, month(creathiontime) as month
from testtable;

creationtime為時間戳格式的轉換（匯入parquet格式時間格式預設是時間戳）

insert overwrite table testtable1 partition(year,month) select
*,from_unixtime(cast(createtime/1000 as int),'yyyy') as year,
month(from_unixtime(cast(createtime/1000 as int),'yyyy-mm-dd hh:mm:ss')) 
as month from testtable;

刪除原表，分割槽表重新命名

drop table testtable;
alter table testtable1 rename to testtable;

動態分割槽之後的資料儲存路徑，乙個分割槽就是乙個目錄

分桶是將某個欄位取雜湊值，雜湊值相同的資料分發到乙個桶中。

在建立分桶表的時候必須指定分桶的字段，和要分桶的數量。

建立分桶表sql語句如下：

create table user_bucket( userid int, username string, fullname string) clustered by(userid) into 2 buckets row format delimited fields terminated by '\t' lines terminated by '\n' stored as parquetfile;

匯入資料到user_bucket分桶表中的步驟：

設定使用分桶屬性：

set hive.enforce.bucketing = true;

執行sql語句

insert overwrite table user_bucket select userid,username,fullname from user;

注意：分割槽和分桶都是按字段來組織資料的存放，分割槽是相同的字段值存放在乙個檔案中，而分桶是字段雜湊值相同的資料存放在乙個檔案中。

再注意：如果通過load的方式將資料檔案載入進分桶表中是不行的，

因為load操作只是單純的複製/移動操作，將資料檔案移動到hive表對應的位置。不會經過mapreduce，所以load方式達不到分桶的效果。

select查詢時cluster by相當於distribute by + order by，將字段按照hash分發到不同reducer中，然後保證每個reducer有序

而建立表時的clustered by表示：clustered by語句宣告的字段中相同的內容會被分配到同乙個reducer處理，並且為分桶提供依據。

完！

hive按年月實現動態分割槽，分桶表建立

Hive分割槽分桶

Hive分割槽分桶

學習Hive 三 Hive引數動態分割槽分桶

hive按年月實現動態分割槽，分桶表建立

Hive分割槽 分桶

Hive分割槽 分桶

學習Hive 三 Hive引數 動態分割槽 分桶

相關推薦

Hive分割槽分桶

Hive分割槽分桶

學習Hive 三 Hive引數動態分割槽分桶