hive 外部表 建立示例

2021-08-15 11:52:37 字數 2604 閱讀 4098

hdfs  dfs -mkdir -p /external/sr/sr_created

hdfs  dfs -mkdir -p /external/sr/sr_assign

hdfs  dfs -mkdir -p /external/sr/sr_cancelled

hdfs  dfs -mkdir -p /external/sr/sr_handle

hdfs  dfs -mkdir -p /external/sr/sr_received

hdfs dfs -put sr_created.txt /external/sr/sr_created/

hdfs dfs -put sr_assign.txt /external/sr/sr_assign/

hdfs dfs -put sr_cancelled.txt /external/sr/sr_cancelled/

hdfs dfs -put sr_handle.txt /external/sr/sr_handle/

hdfs dfs -put sr_received.txt /external/sr/sr_received/

create external table sr_created(

ticket_id string,

phone string,

event_time string

)row format delimited fields terminated by '\t' location '/external/sc/sr_created';

create external table sr_assgin(

ticket_id string,

phone string,

event_time string

)row format delimited fields terminated by '\t' location '/external/sc/sr_assgin';

create external table sr_handle(

ticket_id string,

phone string,

event_time string

)row format delimited fields terminated by '\t' location '/external/sc/sr_handle';

create external table sr_cancelled(

ticket_id string,

phone string,

event_time string

)row format delimited fields terminated by '\t' location '/external/sc/sr_cancelled';

create external table sr_received(

ticket_id string,

phone string,

event_time string

)row format delimited fields terminated by '\t' location '/external/sc/sr_received';

--- 預設開啟map join,但未開啟並行的情況下

未設定select t1.event_time,t2.event_time,t3.event_time from (

select ticket_id,max(event_time) as event_time from sr_created group by  ticket_id)

t1 left outer join (

select ticket_id ,max(event_time) as event_time from sr_assgin group  by ticket_id

) t2 on t1.ticket_id = t2.ticket_id

left outer join (

select ticket_id ,max(event_time) as event_time  from sr_handle group by ticket_id

) t3 on  t1.ticket_id = t3.ticket_id ;

當join 條件一致時,會合併成乙個

time taken: 74.025 seconds   執行時間 74.025s   虛擬機器配置低,生產環境會所不同

---開啟並行,預設開啟map join 的情況下

hive.exec.parallel =true

launching job 1 out of 5

launching job 2 out of 5

launching job 3 out of 5

但是 開啟並行後 任務是一起跑的 ,上面的為啥並行執行? 因為有3個 group by  操作

time taken: 74.492 seconds  執行時間 74.492 s;

-----開啟並行並且關閉 map join的情況下

launching job 1 out of 4

launching job 2 out of 4

launching job 3 out of 4

74.437 seconds

Hive建立外部表

hive建立外部表 1.環境需求 hadoop 2.6.0 hive 1.2.1 hue 3.7.0 2.開發過程 1 建立hdfs目錄 hadoop dfs mkdir wy input 2 上傳本地資料到hdfs中 hadoop dfs put usr wy writing.txt wy inp...

HIVE建立外部表

基礎建表語句 create external table if not exists table name col name data type comment col comment comment table comment partitioned by col name data type c...

hive建表 一 建立外部表

未被external修飾的是內部表 managed table 被external修飾的為外部表 external table 區別 內部表資料由hive自身管理,外部表資料由hdfs管理 內部表資料儲存的位置是hive.metastore.warehouse.dir 預設 user hive wa...