記錄一下工作中用到的hive命令

schematool -dbtype mysql -initschema

nohup hive --service metastore  1>/mnt/metastore.log 2>&1 &nohup hive --service hiveserver2 1>/mnt/hiveserver2.log 2>&1 &

beeline -u "jdbc:hive2:"

// 1. 認證keytab檔案

kinit -kt hive.keytab hive/tdh1

// 2. beeline連線hive

beeline -u "

jdbc:hive2:/default;principal=hive/thd1@tdh

使用ldap來認證，需要加-n （name）和 -p （password）引數

beeline -u "jdbc:hive2://

192.168.0.100:10000" -n hive -p 123456

建立乙個儲存格式為parquet型別的表

create table student_parquet like student_txt stored as parquet;

建立乙個儲存格式為orc型別的表

create table student_orc like student_txt stored as orc;

建立乙個儲存格式為rc型別的表

create table student_rc like student_txt stored as rcfile;

建立乙個儲存格式為sequence型別的表

create table student_seq like student_txt stored as sequencefile;

有個技巧：一般在結尾有"ed"的用於建表語句中，如partitioned by(分割槽)，stored as（儲存格式），clustered by（分桶）等。

create
table
(name string,age int) partitioned by (year string);

create
table
(name string,age int) partitioned by (year string, date string);

show partitions ;

alter
table student_txt drop
ifexists partition (day='
2020
');

alter
table student_txt add partition (day='
2020
');

select
*from student_parquet where
day=
2021;//
day= 分割槽字段

clustered by 《分桶字段》，必須要有分桶字段，對分桶鍵做hash然後取模

create
table student_bucket_parquet (name string, age int) partitioned by (year string) clustered
by (age) into
16 buckets stored as parquet;

alter table ods_lhzb_lhzb_xxgl_tszs_xlxx set location 'hdfs://inceptot1/user/hive/warehouse/ods_lhzb.db/admin/ods_lhzb_lhzb_xxgl_tszs_xlxx_test';

alter table test01 set serdeproperties('field.delim'='\t');

alter table test01 set serdeproperties('serialization.format'='\t');

alter

table student change column name name int comment '姓名

insert

into student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;

insert overwrite table student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;

注意：不能插入相同的表，但是可以插入同一張表的不同分割槽表中

from
student_txt 
insert overwrite table student_parquet partition(day
) select name , min(age), min(day) group
byname
insert
into
table student_parquet partition(day
) select name , max(age), max(day) group
by name;

desc ;

可以檢視numfiles，totalsize等資訊。

desc formatted ;

desc
database
;

檢視sql執行計畫， explain 後面跟sql語句

explain select
*from student_txt;

檢視執行計畫的擴充套件資訊

explain extended select * from student_txt;

檢視sql資料輸入依賴的資訊

explain dependency select

*from student_parquet;

看sql操作涉及的相關許可權資訊

explain authorization

select

*from student_parquet;

表的統計資訊一般包含表儲存的檔案個數（numfiles）、總檔案大小（totalsize）、表的總行數（numrows）、分割槽數（numpartitions）和未壓縮的每行的資料量（rawdatasize）等。

analyze table
compute
statistics;

set hive.exec.dynamic.partition=
true;
set hive.exec.dynamic.partition.mode=nonstrict;

merge job後每個檔案的目標大小（targetsize），用之前job輸出檔案的total size除以這個值，就可以決定merge job的reduce數目。merge job的map端相當於identity map，然後shuffle到reduce，每個reduce dump乙個檔案，通過這種方式控制檔案的數量和大小

hive.merge.size.per.task // 預設是256m

set hive.merge.mapredfiles=true

目前先總結到這!

記錄一下工作中用到的hive命令

總結一下工作中用到的hdfs命令

記錄一下工作中遇到的相容問題

hive工作中用到的一些優化策略

記錄一下工作中用到的hive命令

總結一下工作中用到的hdfs命令

記錄一下工作中遇到的相容問題

hive工作中用到的一些優化策略

相關推薦