hive學習筆記1

hive學習筆記

1.簡單的 wordcount

select word,count(1) from
(select explode(split(sentence,' ')) as word from t2
) tgroup by word;

對t2表中的sentence列進行空格分隔，統計單詞出現的數量

select word,count(1) as n from
(select explode(split(sentence,' ')) as word from t2
) tgroup by word
order by n desc;

對單詞進行倒序排列，order by只產生乙個reduce

2.建表，內部表，外部表

create table t3(sentence string) partitioned by(dt string) //分割槽 row format delimited fields terminated by '\n'; //建立內部表 load data local inpath '本地路徑' into table t3; //把本地資料匯入內部表 create external table t2(sentence string) row format delimited fields terminated by '\n' stored as textfile

location '/file'; //把hdfs中的file目錄下的資料匯入外部表

檢視表中的分割槽

show partitions tablename

插入分割槽資料

insert overwrite table t3 partition(dt='201911')
select * from t2 limit 100;
//把t2中的100行資料插入到t3表中的201911的分割槽中

分割槽篩選資料

select * from t3 where dt between '201911' and '201912'

//顯示分割槽在201911和101912間的資料

表的分桶,建立4個桶的表

set hive.enforce.bucketing = true;
create table t1(
user_id int,
item_id string,
rating string
)clustered by(user_id)
into 4 bucket;

分桶取樣1/4

select * from t1 tablesample(bucket 1 out of 4 on user_id);

桶中取樣建表t2

create table t2 as select * from t1 tablesample(bucket 1 out of 4 on user_id);

Hive學習筆記 1

資料型別和檔案格式筆記內容主要來自hive程式設計指南 hive服務 hive命名空間 hive中一次使用命令 hive e select from mytable limit 3 hive s e select from mytable limit 3 靜默模式從檔案中執行hive查詢 hi...

Hive學習筆記 Hive 引數

第一部分 hive 引數 hive.exec.max.created.files 說明所有hive執行的map與reduce任務可以產生的檔案的和預設值 100000 hive.exec.dynamic.partition 說明是否為自動分割槽預設值 false hive.mapred.re...

Hive學習筆記 Hive概述

1.1 資料倉儲可以利用資料倉儲來儲存我們的資料，但是資料倉儲有別於我們常見的一般資料庫。資料倉儲是乙個面向主題的整合的不可更新的隨時間不變化的資料整合，它用於支援企業或組織的決策分析處理。物件導向的倉庫中的資料是按照一定的主題進行組織的。主題即使用者使用資料倉儲進行決策時所關心的重點方面...

hive學習筆記1

Hive學習筆記 1

Hive學習筆記 Hive 引數

Hive學習筆記 Hive概述

相關推薦