HIVE精煉筆記總結建導篇

hive中有乙個預設的庫：

庫名： default

庫目錄：hdfs://hdp20-01:9000/user/hive/warehouse

新建庫：

create database db_order;

庫建好後，在hdfs中會生成乙個庫目錄：

hdfs://hdp20-01:9000/user/hive/warehouse/db_order.db

use db_order;

create table t_order(id string,create_timestring,amount float,uid string);

表建好後，會在所屬的庫目錄中生成乙個表目錄

/user/hive/warehouse/db_order.db/t_order

只是，這樣建表的話，hive會認為表資料檔案中的字段分隔符為 ^a

正確的建表語句為：

create table t_order(id string,create_timestring,amount float,uid string)

row format delimited

fields terminated by ',';

這樣就指定了，我們的表資料檔案中的字段分隔符為 ","

drop table t_order;

刪除表的效果是：

hive會從元資料庫中清除關於這個表的資訊；

hive還會從hdfs中刪除這個表的表目錄；

內部表(managed_table)：表目錄按照hive的規範來部署，位於hive的倉庫目錄/user/hive/warehouse中

外部表(external_table)：表目錄由建表使用者自己指定

create external tablet_access(ip string,url string,access_time string)

row format delimited

fields terminated by ','

location '/access/log';

外部表和內部表的特性差別：

1、內部表的目錄在hive的倉庫目錄中 vs 外部表的目錄由使用者指定

乙個hive的資料倉儲，最底層的表，一定是來自於外部系統，為了不影響外部系統的工作邏輯，在hive中可建external表來對映這些外部系統產生的資料目錄；

然後，後續的etl操作，產生的各種表建議用managed_table

分割槽表的實質是：在表目錄中為資料檔案建立分割槽子目錄，以便於在查詢時，mr程式可以針對分割槽子目錄中的資料進行處理，縮減讀取資料的範圍。

比如，**每天產生的瀏覽記錄，瀏覽記錄應該建乙個表來存放，但是，有時候，我們可能只需要對某一天的瀏覽記錄進行分析

這時，就可以將這個表建為分割槽表，每天的資料匯入其中的乙個分割槽；

當然，每日的分割槽目錄，應該有乙個目錄名（分割槽字段）

1.2.4.1. 乙個分割槽欄位的例項：

示例如下：

1、建立帶分割槽的表

create table t_access(ip string,url string,access_time string)

partitioned by(dt string)

row format delimited

fields terminated by ',';

注意：分割槽字段不能是表定義中的已存在字段

2、向分割槽中匯入資料

load data localinpath '/root/access.log.2017-08-04.log' into table t_accesspartition(dt='20170804');

load data localinpath '/root/access.log.2017-08-05.log' into table t_accesspartition(dt='20170805');

3、針對分割槽資料進行查詢

a、統計8月4號的總pv：

select count(*) from t_accesswhere dt='20170804';

實質：就是將分割槽字段當成表字段來用，就可以使用where子句指定分割槽了

b、統計表中所有資料總的pv：

select count(*) from t_access;

實質：不指定分割槽條件即可

1.2.4.2. 多個分割槽字段示例

建表：create table t_partition(id int,namestring,age int)

partitioned by(departmentstring,*** string,howold int)

row format delimited fields terminated by',';

導資料：

load data localinpath '/root/p1.dat' into table t_partition partition(department='xiangsheng',***='male',howold=20);

可以通過已存在表來建表：

1、create tablet_user_2 like t_user;

新建的t_user_2表結構定義與源表t_user一致，但是沒有資料

2、在建表的同時插入資料

create table t_access_user

as select ip,url from t_access;

t_access_user會根據select查詢的字段來建表，同時將查詢的結果插入新錶中

方式1：匯入資料的一種方式：

手動用hdfs命令，將檔案放入表目錄；

方式2：在hive的互動式shell中用hive命令來匯入本地資料到表目錄

hive>load data local inpath '/root/order.data.2' into table t_order;

方式3：用hive命令匯入hdfs中的資料檔案到表目錄

hive>load data inpath'/access.log.2017-08-06.log' into table t_access partition(dt='20170806');

注意：導本地檔案和導hdfs檔案的區別：

本地檔案匯入表：複製

hdfs檔案匯入表：移動

1、將hive表中的資料匯入hdfs的檔案

insert overwrite directory'/root/access-data'

row format delimited fields terminatedby ','

select * from t_access;

2、將hive表中的資料匯入本地磁碟檔案

insert overwrite local

directory '/root/access-data'

row format delimited fields terminatedby ','

select * from t_access limit 100000;

hive支援很多種檔案格式： sequencefile | text file | parquet file | rc file

create table t_pq(movie string,rateint) stored as textfile;

create table t_pq(movie string,rateint) stored as sequencefile;

create table t_pq(movie string,rate int) stored as parquetfile;

演示：1、先建乙個儲存文字檔案的表

create table t_access_text(ip string,url string,access_time string)

row format delimited fields terminated by','

stored as textfile;

匯入文字資料到表中：

load data local inpath'/root/access-data/000000_0' into table t_access_text;

2、建乙個儲存sequence file檔案的表：

create table t_access_seq(ip string,urlstring,access_time string)

stored as sequencefile;

從文字表中查詢資料插入sequencefile表中，生成資料檔案就是sequencefile格式的了：

insert into t_access_seq

select * from t_access_text;

3、建乙個儲存parquet file檔案的表：

create table t_access_parq(ip string,urlstring,access_time string)

stored as parquetfile;

hive總結筆記

hive是基於hadoop的資料倉儲軟體，使用類sql的hiveql語言實現資料查詢功能，可以查詢和管理pb級別的分布式資料，所有hive的資料都儲存在hdfs中特點靈活方便的 etl提取轉換載入支援mapreduce，tez，spark等多種計算引擎。可直接訪問hdfs檔案以及hbase...

ORACLE學習筆記新建使用者及建表篇

一使用者相關sql語句新建使用者 create user sa identified by 2013 說明 sa使用者名稱，2013密碼授權connect,resource給使用者sa grant connect,resource to sa 說明 connect角色是授予終端使用者的典型權...

設計模式學習筆記（總結篇模式分類）

gof根據模式的目標將模式分為三個類目建立型行為型和結構型。建立型模式設計物件的例項化，這類模式的特點是，不讓使用者依賴於物件的建立或排列方式，避免使用者直接使用new運算子建立物件。gof的 23中模式中的下列 5種模式屬於建立型模式 1 工廠方法模式定義乙個用於建立物件的介面，讓子類決定例...

HIVE精煉筆記總結 建導篇

hive總結筆記

ORACLE學習筆記 新建使用者及建表篇

設計模式學習筆記（總結篇 模式分類）

相關推薦

HIVE精煉筆記總結建導篇

ORACLE學習筆記新建使用者及建表篇

設計模式學習筆記（總結篇模式分類）