大資料培訓課程 Hive配置總結

# ------------------配置相關--------------------------

0. 配置儲存

在$home目錄下的.hiverc檔案中

1. 在提示符中顯示資料庫名

set hive.cli.print.current.db=true;

2. 優先使用本地模式執行

set hive.exec.mode.loacl.auto=true;

3. 列印列名

set hive.cli.print.header=true;

4. （非）嚴格模式

set hive.mapred.mode=strict;

set hive.mapred.mode=nonstrict;

5. 開啟動態分割槽

set hive.exec.dynamic.partition=true;

# --------------動態分割槽調整--------------------

6.-- 設定動態分割槽模式

set hive.exec.dynamic.partition.mode=strict

7.-- 設定總的動態分割槽個數

set hive.exec.max.dynamic.partitions=300000

8.-- 設定每個節點上動態分割槽個數

set hive.exec.max.dynamic.partitions.pernode=10000

9. 設定全域性可以產生檔案的個數

set hive.exec.max.created.files=100000;

10.開啟map site join

set hive.auto.convert.join=true;

11.設定小表大小（位元組）

set hive.mapjoin.smalltable.filesize=25000000

12.強制將資料按照桶結構定義來插入資料

set hive.enforce.bucketing=true

13.limit優化

-- 啟用limit優化

set hive.limit.optimize.enable=true;

-- 限制從最大多少條資料中進行limit

set hive.limit.row.max.size=10000;

-- 限制最多遍歷的檔案個數

set hive.limit.optimize.limit.file=10;

14. 壓縮

--開啟中間壓縮（即map到reduce之間的資料壓縮）

set hive.exec.compress.intermediate=true;

-- 開啟hadoop中間壓縮（即map到reduce之間的資料壓縮）

set mapred.compress.map.output=true;

-- 開啟hive最終壓縮（即reduce輸出的資料壓縮）

set hive.exec.compress.output=true;

4. 資料倉儲的儲存位址

hive-default.xml中，不一般不同

hive.metastore.warehouse.dir

/user/hive/warehouse

location of default database for the warehouse

hive大資料傾斜總結

在做shuffle階段的優化過程中，遇到了資料傾斜的問題，造成了對一些情況下優化效果不明顯。主要是因為在job完成後的所得到的 counters是整個job的總和，優化是基於這些counters得出的平均值，而由於資料傾斜的原因造成map處理資料量的差異過大，使得這些平均值能代表的價值降低。hiv...

大資料hive個人學習總結

hive是乙個可以把資料用sql處理的工具，資料儲存再hdfs上，底層處理是用mr，通過用sql的方式通過mr獲得需要的資料，執行程式執行再yarn上。資料儲存不同，hive資料儲存到hdfs上，用mr處理，mysql儲存在磁碟上，可以把hive看成對hdfs上的資料處理的客戶端工具，除了語言有一樣...

大資料 Hive 簡介

第一部分 hive簡介什麼是hive hive是基於hadoop的乙個資料倉儲工具，可以將結構化的資料檔案對映為一張資料庫表，並提供類sql查詢功能。本質是將sql轉換為mapreduce程式第二部分為什麼使用hive 面臨的問題人員學習成本太高專案週期要求太短我只是需要乙個簡單的環境 ...

大資料培訓課程 Hive配置總結

hive大資料傾斜總結

大資料hive個人學習總結

大資料 Hive 簡介

相關推薦