Hive之壓縮儲存

壓縮儲存有時候可以獲得更好的效能。

使用textfile儲存時，可以使用gzip或者bzip2進行壓縮。操作如下

create table raw (line string) row format delimited fieldsterminated by '\t' lines terminated by '\n';

load data local inpath '/tmp/weblogs/20090603-access.log.gz' intotable raw;

以上操作的缺點是hive在查詢時不能分隔壓縮檔案，不能並行執行map，只能執行乙個map。

更好的方式是如下操作：

create table raw (line string) row format delimited fieldsterminated by '\t' lines terminated by '\n';

create table raw_sequence (line string) stored as sequencefile;

load data local inpath '/tmp/weblogs/20090603-access.log.gz' intotable raw;

set hive.exec.compress.output=true;

set io.seqfile.compression.type=block; -- none/record/block (seebelow)

insert overwrite table raw_sequence select * from raw;

把資料插入到另一張表。另一種表使用sequencefile儲存。

Hive的壓縮和儲存

資料儲存格式 hive和hadoop一樣，也可以使用壓縮來節省我們的mr處理的網路頻寬。其中壓縮格式和效能，也和hadoop類似。開啟map輸出階段壓縮開啟reduce輸出階段壓縮當hive將輸出寫入到表中時，輸出內容同樣可以進行壓縮。屬性hive.exec.compress.output控制著...

Hive壓縮儲存效能測試

textfile格式 create table textfile id int,name string partitioned by date string row format delimited fields terminated by lines terminated by n parquet...

hive庫的儲存與壓縮

儲存格式儲存方式 textfile 預設儲存格式，行儲存 orcfile 按照行分塊，塊按照列儲存，每個塊都儲存有乙個索引，資料壓縮率非常高 parquet 行式儲存，很好的壓縮效能，可以減少大量的表掃瞄和反序列化的時間 create table student orcfile zlib id s...

Hive之壓縮儲存

Hive的壓縮和儲存

Hive壓縮儲存效能測試

hive庫的儲存與壓縮

相關推薦