15 Hadoop壓縮和儲存

hadoop壓縮和儲存思維導圖

為了支援多種壓縮/解壓縮演算法，hadoop引入了編碼/解碼器，如下表所示

2）開啟mapreduce中map輸出壓縮功能

3）設定mapreduce中map輸出資料的壓縮方式

4）執行查詢語句

2）開啟mapreduce最終輸出資料壓縮

3）設定mapreduce最終資料輸出壓縮方式

4）設定mapreduce最終資料輸出壓縮為塊壓縮

5）測試一下輸出結果是否是壓縮檔案

列儲存的特點：

textfile和sequencefile的儲存格式都是基於行儲存的

orc和parquet是基於列式儲存的

row data：

stripe footer：

每個檔案有乙個file footer，這裡面存的是每個stripe的行數，每個column的資料型別資訊等；每個檔案的尾部是乙個postscript，這裡面記錄了整個檔案的壓縮型別以及filefooter的長度資訊等。在讀取檔案時，會seek到檔案尾部讀postscript，從裡面解析到file footer長度，再讀filefooter，從裡面解析到各個stripe資訊，再讀各個stripe，即從後往前讀。

create table log_text ( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )row format delimited fields terminated by'\t'

stored as textfile ;

- （2）向表中載入資料 - hive (default)> load data local inpath '/root/log' into table log_text ; - （3）檢視表中資料大小 - dfs -du -h /user/hive/warehouse/log_text;

- 18.1 m /user/hive/warehouse/log_text/log.data

create table log_orc( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )row format delimited fields terminated by'\t'

stored as orc ;

- （2）向表中載入資料 - insert into table log_orc select * from log_text ; - （3）檢視表中資料大小 - dfs -du -h /user/hive/warehouse/log_orc/ ;

- 2.8 m /user/hive/warehouse/log_orc/000000_0

- create table log_parquet( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )row format delimited fields terminated by'\t'

stored as parquet ;

- （2）向表中載入資料 - insert into table log_parquet select * from log_text ; - （3）檢視表中資料大小 - dfs -du -h /user/hive/warehouse/log_parquet/ ;

- 13.1 m /user/hive/warehouse/log_parquet/000000_0

create table log_orc_none( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )row format delimited fields terminated by'\t' stored as orc tblproperties ( "orc.compress" ="none"

);

（3）檢視插入後資料

7.7 m /user/hive/warehouse/log_orc_none/000000_0

- create track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )row format delimited fields terminated by'\t' stored as orc tblproperties ( "orc.compress"=)

;

（3）檢視插入後資料

15 Hadoop壓縮和儲存

Hive的壓縮和儲存

Hadoop和Lexst的儲存策略

15 shutil 模組拷貝和壓縮

15 Hadoop壓縮和儲存

Hive的壓縮和儲存

Hadoop和Lexst的儲存策略

15 shutil 模組 拷貝和壓縮

相關推薦

15 shutil 模組拷貝和壓縮