資料庫 HIVE SQL索引及其使用

最近在用一張8億資料量表作為主表去關聯乙個千萬量級的表時遇到乙個問題，job執行的特別慢，而且大量的時間花費在了大表的查詢上。如何解決這個問題，首先想到是不是由於資料偏移造成的，對應了各種資料偏移的場景，最後認定不是資料偏移造成的。那怎麼辦呢？後來想到用索引！由於對於索引不是特別了解，查了各種資料，這裡做乙個總結，加深對索引的理解。

hive中如何建立索引？

最最最重要的，是要先建表，並且對錶的儲存方式有要求（如果你所建的表本身是如下方式建立的，那麼忽略這一步）：

use dw_htldatadb;
drop table table;
create table table(
aa string comment 'asd',
bb int  comment 'asd'
) comment 'table comment'
partitioned by(d string comment 'date')
row format delimited fields terminated by ',' 
stored as textfile;

如果你按照下面這種方式建表，那麼，你之後可能會遇到，再重建索引時，報錯的情況（這裡我真的不知道為啥子，只是在嘗試過程中，發現了這個坑）

use dw_htldatadb;
drop table table;
create table table(
aa string comment 'asd',
bb int  comment 'asd'
) comment 'table comment'
partitioned by(d string comment 'date')
row format delimited fields terminated by ',' 
stored as orc;

建立索引，這裡注意的是，我在查資料時，看到有小夥伴用了分割槽建立索引的方法，但是我並沒有試驗成功，不知道是否和hive版本有關，所以，我用的是最常用的方式建立的索引。

成功版本：對錶table的aa列建立索引

use database; create index idx_table on table table(aa) as 'org.apache.hadoop.hive.ql.index.compact.compactindexhandler' with deferred rebuild

in table tablen_index;

未成功版本：

use database;
create index idx_table on table table(eid)
as 'org.apache.hadoop.hive.ql.index.compact.compactindexhandler' 
with deferred rebuild
in table table_index
partitioned by (d,eid)
;

重建索引，這裡，如果你的表已經有分割槽有資料，那麼你可以對單個分割槽重建索引，也可以對整個表建立索引，但通過實踐，是在表中未產生資料時，建立索引比較快，而且在你的表再次插入資料時，也不需要再執行此操作。

對錶重建索引：

use database;
alter index idx_table rebuild;

對錶的分割槽重建建索引：

use database;
alter index idx_table partition (d = '2019-04-01') rebuild ;

建立好之後，就可以對錶進行操作了。

重建之後可以顯示索引：

show formatted index on table;

案例結果：

idx_name

tab_name

col_names

idx_tab_name

idx_type

comment

idx_tmp_ceshi

tmp_ceshi

eidtmp_ceshi_index

compact

想要索引在查詢時生效，還需要設定引數，預設情況下不使用索引。

set hive.input.format=org.apache.hadoop.hive.ql.io.hiveinputformat;
set hive.optimize.index.filter=true;
set hive.optimize.index.filter.compact.minsize=0;

這個部分也解決了我大表查詢時間長的問題，再把大表建成索引表，再去關聯時，和之間相比，時間縮短了一半，不過，索引有自身的試用情況，用時還是需要了解使用的場景的。下個小節會對分桶進行**，這也是優化job的乙個方向。

資料庫 HIVE SQL索引及其使用

Hive SQL 之資料庫

資料庫索引及其工作原理

資料庫資料庫索引

資料庫 HIVE SQL索引及其使用

Hive SQL 之 資料庫

資料庫索引及其工作原理

資料庫 資料庫索引

相關推薦

Hive SQL 之資料庫

資料庫資料庫索引