Hive壓縮儲存效能測試

textfile格式:

create table textfile (id int,name string)
partitioned by ( date string)
row format delimited fields terminated by ',' lines terminated by '\n';

parquet格式：

create table parquet(id int,name string)
partitioned by ( date string)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as parquet;

partitioned by ( date string)

row format delimited fields terminated by ',' lines terminated by '\n'

給textfile表載入資料：

load data local inpath '/var/lib/hadoop-hdfs/test.gz' into table textfile;

給parquet表載入資料：

set hive.exec.dynamic.partition.mode=nonstrict
set hive.exec.compress.output=true;
insert overwrite table parquet partition(date) select * from textfile;

set hive.exec.dynamic.partition.mode=nonstrict
set hive.exec.compress.output=true;

檔案型別

大小(g)

textfile

477.7

parquet

347.2

185.0

簡單的執行一下

select count(*)  from table;

結果都是：

1830337392

檔案型別

耗時(senconds)

textfile

156.204

parquet

247.831

210.01

查詢效率textfile最高，parquet最低。也就是不經過壓縮的資料查詢效率會更高，因此一般ods層為了減少儲存空間會進行壓縮，但dw層可能會為了快速運算，可以考慮不壓縮（具體根據實際業務來設計）。

通過簡單的測試，發現採用hive程式設計指南的方案是最佳實戰

Hive之壓縮儲存

壓縮儲存有時候可以獲得更好的效能。使用textfile儲存時，可以使用gzip或者bzip2進行壓縮。操作如下 create table raw line string row format delimited fieldsterminated by t lines terminated by n ...

Hive的壓縮和儲存

資料儲存格式 hive和hadoop一樣，也可以使用壓縮來節省我們的mr處理的網路頻寬。其中壓縮格式和效能，也和hadoop類似。開啟map輸出階段壓縮開啟reduce輸出階段壓縮當hive將輸出寫入到表中時，輸出內容同樣可以進行壓縮。屬性hive.exec.compress.output控制著...

hive庫的儲存與壓縮

儲存格式儲存方式 textfile 預設儲存格式，行儲存 orcfile 按照行分塊，塊按照列儲存，每個塊都儲存有乙個索引，資料壓縮率非常高 parquet 行式儲存，很好的壓縮效能，可以減少大量的表掃瞄和反序列化的時間 create table student orcfile zlib id s...

Hive壓縮儲存效能測試

Hive之壓縮儲存

Hive的壓縮和儲存

hive庫的儲存與壓縮

相關推薦