HIVE分割槽，靜態分割槽，動態分割槽

分割槽可以大大提公升hive的效能，這裡就要提到數倉的分層

原始資料層，儲存原始收集的資料

數倉明細層，裡面做的是轉換和分析，裡面包含部分的資料清洗的過程

數倉服務層，對外業務的處理，如維度轉**鍵、身份證清洗、會員註冊**清晰、字段合併、空值處理、髒資料處理、ip清晰轉換等；

最終業務層

（適合做增量表，資料量大）

建乙個user表，裡面三個字段，id，name,birth，還有乙個***字段，用來分割槽，初步的想法是，分成男人和女人

hive>
create
table
user
(> id string,
> name string,
> birth string
>
)> partitioned by
(*** string)
>
row format delimited fields
terminated
by','
> stored as textfile;
oktime taken: 0.143 seconds

使用load匯入資料

結果與我們的預期不符，這個命令直接把所有的資料的***都改為male，沒有了分割槽的意義

短暫的思考過後，決定先增加分割槽，再插入對應的資料（直接插入對應的資料也行）

hive> alter table user add
> partition(***=
"male"
)> partition(***=
"female"
)>;ok
time taken: 0.136 seconds

雖然這樣直接可以做出分割槽，但好像匯入資料不怎麼方便，

下面還是直接用insert的方法做

先把資料正常以表匯入

hive> create table ods_users(userid string,username string,birthday string,*** string )row format delimited fields terminated by ',' > location '/opt/hive/users.csv';ok time taken: 0.048 seconds

然後從表中挑出對應的列，和***的值進行插入操作

hive> insert  into  user  partition(***=
'male'
)>
select  userid,username,birthday
>  from  ods_users  where  ***=
'male'
;

這樣做就成功了

（資料量較小，全量匯入）

開始動態，隨開隨用

hive>
set  hive.exec.dynamic.partition=true;
hive>
set  hive.exec.dynamic.partition.mode=nonstrict;

建表一樣，直接插入即可，不需要對***取值

hive> create  table  myusers(  userid  string,username  string,birthday  string)  partitioned  by  (***  string)
> row  format  delimited  fields  terminated  by  ','  stored  as  textfile;
oktime taken: 0.039 seconds
hive> insert  into  myusers  partition(***)
select  *  from  ods_users;

Hive 動態分割槽靜態分割槽

本文參考 hive預設是靜態分割槽。但是有時候可能需要動態建立不同的分割槽來區分不同的分類。hive中建立分割槽表沒有什麼複雜的分割槽型別範圍分割槽列表分割槽 hash分割槽混合分割槽等分割槽列也不是表中的乙個實際的字段，而是乙個或者多個偽列。意思是說在表的資料檔案中實際上並不儲存分割槽列...

HIVE 動態分割槽與靜態分割槽

hive分割槽，實際上是通過乙個路徑來標識的，而不是在物理資料中。比如每天的資料，可能分割槽是pt 20121023這樣，那麼路徑中它就會變成 hdfs path pt 20121023 data files。通過路徑來標識的好處是，如果我們需要取特定分割槽的資料，只需要把這個路徑下的資料取出來就可...

HIVE分割槽，靜態分割槽，動態分割槽

Hive 動態分割槽 靜態分割槽

HIVE 動態分割槽與靜態分割槽

hive 動態分割槽和靜態分割槽

相關推薦

Hive 動態分割槽靜態分割槽