Hive程式設計指南學習筆記01

第四章： hql的資料定義

1：建立資料庫

create database financials;

create database if not exists financials;

2: 檢視資料庫

show databases;

模糊查詢資料庫

show databases like 'h.*' ;

3：建立資料庫修改資料庫的預設位置

create database financials localtion '/my/preferred/directory'

4：增加資料庫的描述資訊

create database financials comment 'holds all financials tables'

5: 顯示資料庫的描述的資訊

describe database financials;

create database financials

with dbproperties ('create'= 'mark moneybags', 'data'='2012-12-12');

describe database extended financials;

7:沒有命令提示讓使用者檢視當前所在的是那個資料庫。可以重複使用use

use financials；

可以通過設定乙個屬性值來在提示符裡面顯示當前所在的資料庫

set hive.cli.print.current.db = true;

set hive.cli.print.current.db= false;

8:刪除資料庫

drop database if exists financials;

hive是不允許刪除乙個包含表的資料庫，

當時如果加上關鍵字： cascade，就可以了，hive自動刪除資料庫中的表

drop database if exists financials cascade;

9：修改資料庫，設定dbproperties鍵值對屬性值

alert database financials set dbproperties('edited-by'='joe dba');

10:建立表：

create table if not exists employees (

name string comment 'employee name',

salary float comment 'employee salary ',

subordinates arraycomment 'employee name of subordinates ' ,

deductions map,

address struct

)comment ' description of the table '

tblproperties ('creater'= 'me', 'created_at'='2012-12-12');

location '/user/hive/warehouse/mydb.db/employees'

-- tblproperties 的主要作用是：按鍵-值對的格式為表增加額外的文件說明

11: 列舉某個表的tblproperties 屬性資訊

show tblproperties employees;

12：拷貝表

create table if not exists mydb.employees2 like mydb.employees2

13：選擇資料庫

use mydb

顯示表show tables;

show tables in mydb;

14：檢視這個表的詳細結果資訊

describe extended mydb.employees

使用formatted 關鍵字代替

extended

describe formatted mydb.employees

15：管理表：內部表：刪除表時，會刪除這個表的資料

建立乙個外部表：其可以讀取所有位於/data/stocks目錄下的以逗號分割的資料

create external table if not exists

stocks(

exchange string,

symbol string,

ymd string,

price_open float,

price_hight float,

price_low float,

price_close float,

volume int,price_adj_close float)

row format delimited fields terminated by ','

location '/data/stocks'

16：檢視表是否是管理表還是外部表

describe extended tablename

輸出資訊：

tabletype.managed_table--管理表

tabletype.external_table--外部表

-- 複製表但不會複製資料

create table if not exists mydb.employees3(新錶)

like mydb.employees2(原表) location '/data/stocks'

17：建立分割槽表

create table employees (

name string,

salary float,

subordinates array,

deductions map,

address struct

)partitioned by (country string,state string);

分割槽自段：

country string,state string 和普通字段一樣，相當於索引字段，

根據分割槽字段查詢，提交效率，提高查詢效能

18： set hive.mapred.mode=strict;

如果對分割槽表進行查詢而where子句沒有加分割槽過濾的話，

將會禁止提交這個任務。

可以設定為：nostrict

19：檢視表中存在的所有分割槽

show partitions employees;

20：檢視是否儲存某個特定分割槽鍵的分割槽的話

show partitions employees partition(country='us');

describe extended employees 命令也會顯示分割槽鍵

管理大型生產資料集最常見的情況：使用外部分割槽表

21：在管理表中使用者可以通重載入資料的方式建立分割槽：

load data local inpath '/home/hive/california-employees'

into table employees

partition(country='us',state='ca');

hive 將會建立這個分割槽對應的目錄..../employees/country=us/state=ca

22:建立外部分割槽表

create table if not exists log_messages (

hms int,

severity string,

server string,

process_id int,

message string

)partitioned by (year int,month int,day int)

row format delimited fields terminated by '\t'

1:order by 會對輸入做全域性排序

2: sort可以控制每個reduce產生的檔案都是排序，再對多個排序的好的檔案做二次歸併排序。

sort by 特點如下：

1) . sort by 基本受hive.mapred.mode是否為strict、nonstrict的影響，但若有分割槽需要指定分割槽。

2). sort by 的資料在同乙個reduce中資料是按指定字段排序。

3). sort by 可以指定執行的reduce個數，如：set mapred.reduce.tasks=5 ,對輸出的資料再執行歸併排序，即可以得到全部結果。

結果說明：嚴格模式下，sort by 不指定limit 數，可以正常執行。sort by 受hive.mapred.mode=sctrict 的影響較小。

3:distribute by

distribute by 是控制在map端如何拆分給reduce端。根據distribute by 後面的列及reduce個數進行資料分發，預設採用hash演算法。distribute可以使用length方法會根據string型別的長度劃分到不同的reduce中，最終輸出到不同的檔案中。 length 是內建函式，也可以指定其他的函式或這使用自定義函式。

4: cluster by

cluster by 除了distribute by 的功能外，還會對該字段進行排序，所以cluster by = distribute by +sort by

Hive程式設計指南學習筆記（三）資料操作

一向管理表中裝載資料 hive沒有行級別的資料插入更新和刪除操作，往表中裝載資料的唯一途徑就是使用一種大量的資料裝載操作。load 向表中裝載資料 1 把目錄 usr local data 下的資料檔案中的資料裝載進usr表，並覆蓋原有資料 load data local inpath us...

《Hive程式設計指南》之Hive環境安裝

hive 1 解壓 2 配置 hadoop home hive home等環境變數 3 修改配置檔案hive conf cp hive default.xml.template hive site.xml 修改hive.metastore.schema.verification的值為false 建立...

Hive程式設計指南07 索引

hive索引建立索引 create index tablename index on table col1 as org.apache.hadoop.hive.ql.index.compact.compactindexhandler with deferred rebuild idxpropert...

Hive程式設計指南 學習筆記01

Hive程式設計指南 學習筆記（三） 資料操作

《Hive程式設計指南》之Hive環境安裝

Hive程式設計指南07 索引

相關推薦

Hive程式設計指南學習筆記01

Hive程式設計指南學習筆記（三）資料操作