Hadoop學習之Hive簡介

1.hive的基本架構

2.hive的資料儲存

例如：tbl_pv 表中包含 ds 和 city 兩個 partition，則對應於 ds = 20090801, ctry = us 的 hdfs 子目錄為：/wh/tbl_pv/ds=20090801/ctry=us；對應於 ds = 20090801, ctry = ca 的 hdfs 子目錄為/wh/pvs/ds=20090801/ctry=ca

buckets 對指定列計算 hash，根據 hash 值切分資料，每乙個 bucket 對應乙個檔案。可用於取樣:

create
table sales( id int, name string)
paritioned by (ds string)
clustered by (id) into
32 buckets;
select id from sales tablesample (bucket 1 out of
32);

Hadoop學習筆記之Hadoop簡介

apache hadoop 是乙個開源的可靠的靈活的分布式的計算系統來自官網主要受google 三篇的啟發 gfs mapreduce bigtable hadoop 海量資料的儲存 hdfs hadoop distributed file system 海量資料的分析 mapreduc...

Hadoop學習之HBase和Hive的區別

hive是為簡化編寫mapreduce程式而生的，使用mapreduce做過資料分析的人都知道，很多分析程式除業務邏輯不同外，程式流程基本一樣。在這種情況下，就需要hive這樣的使用者程式設計介面。hive本身不儲存和計算資料，它完全依賴於hdfs和mapreduce，hive中的表純邏輯表，就是些...

Hive之 hive與hadoop的聯絡

hive與hadoop呼叫圖解析 1 提交sql 交給驅動 2 驅動編譯解析相關的字段表資訊 3 去metastore查詢相關的資訊返回字段表資訊 4 編譯返回資訊發給驅動 5 驅動傳送乙個執行計畫交給執行引擎 6 執行計畫三種形式 metastore namenode metastor...

Hadoop學習之Hive簡介

Hadoop學習筆記之Hadoop簡介

Hadoop學習之HBase和Hive的區別

Hive之 hive與hadoop的聯絡

相關推薦