Hive Hive表檔案壓縮介紹

目錄壓縮

（1）壓縮概述

（2）開啟map輸出階段壓縮

（3）開啟reduce輸出階段壓縮

（4）建立表時指定壓縮格式 mr

支援的壓縮編碼

為了支援多種壓縮/解壓縮演算法，hadoop引入了編碼/解碼器

壓縮效能的比較

假如有乙個表：

createtableemp_t(

idint,

namestring,

deptnoint)

rowformat delimited

fields terminatedby','

collection items terminatedby'-'

map keys terminatedby':'

storedastextfile;

開啟map輸出階段壓縮可以減少job中map和reduce task間資料傳輸量。具體配置如下：

案例實操：

開啟hive中間傳輸資料壓縮功能

set hive.exec.compress.intermediate=true;

開啟mapreduce中map輸出壓縮功能

set mapreduce.map.output.compress=true;

設定mapreduce中map輸出資料的壓縮方式

執行查詢語句

select count(1) as name from emp_t;

當hive將輸出寫入到表中時，輸出內容同樣可以進行壓縮。屬性hive.exec.compress.output控制著這個功能。使用者可能需要保持預設設定檔案中的預設值false，這樣預設的輸出就是非壓縮的純文字檔案了。使用者可以通過在查詢語句或執行指令碼中設定這個值為true，來開啟輸出結果壓縮功能。

案例實操：

開啟hive最終輸出資料壓縮功能，預設false

set hive.exec.compress.output=true;

開啟mapreduce最終輸出資料壓縮，預設false

set mapreduce.output.fileoutputformat.compress=true;

設定mapreduce最終資料輸出壓縮方式,

預設：mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.deflatecodec

設定mapreduce最終資料輸出壓縮為塊壓縮(

none

、record、

block)

set mapreduce.output.fileoutputformat.compress.type=block;

剛建立表時，hdfs中沒有檔案

insert into emp_t(id,name,deptno)values(1,'zhangsan',1);

關閉mapreduce壓縮後插入資料，檔案格式為textfile：

建立表時指定壓縮格式和通過設定reduce輸出階段壓縮的功能一樣。

create table emp_t1(
id int,
name string,
deptno int)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
插入資料：
insert into emp_t1(id,name,deptno)values(2,'zhangsan',1);

檢視hdfs的檔案：

Hive hive使用壓縮

hive中的資料使用壓縮的好處執行查詢時會自動解壓可以節約磁碟的空間，基於文字的壓縮率可達40 壓縮可以增加吞吐量和效能量減小載入記憶體的資料量但是在壓縮和解壓過程中會增加cpu的開銷。所以針對io密集型的jobs 非計算密集型可以使用壓縮的方式提高效能。檢視集群的支援的壓縮演算法.hiv...

Hive Hive的資料壓縮

在實際工作當中，hive當中處理的資料，一般都需要經過壓縮，前期我們在學習hadoop的時候，已經配置過hadoop的壓縮，我們這裡的hive也是一樣的可以使用壓縮來節省我們的mr處理的網路頻寬壓縮格式工具演算法副檔名是否可切分 default 無default deflate 否gzip ...

hive hive中建立表

一建立表語句 create table if not exists mydb.employees name string comment employee name salary float subordinates arraydeductions mapaddress struct commen...

Hive Hive表檔案壓縮介紹

Hive hive使用壓縮

Hive Hive的資料壓縮

hive hive中建立表

相關推薦