使用PyHive連線Hive資料倉儲

pyhive是python語言編寫的用於操作hive的簡便工具庫。

from pyhive import hive
conn = hive.connection(host=
'192.168.0.1'
,                       port=
10000
,                       auth=
"custom"
,                       database=
'gld'
,                       username=
'hive'
,                       password=
'hive'
)cursor = conn.cursor(
)cursor.execute(
'select * from student limit 10'
)for result in cursor.fetchall():
print
(result)
cursor.close(
)conn.close(
)

輸出：

('1', '孫悟空', '男', '18', '01')
('2', '明世隱', '男', '19', '01')
('3', '高漸離', '男', '20', '02')
('4', '孫尚香', '女', '21', '02')
('5', '安琪拉', '女', '22', '03')

其中，cursor.fetchall()返回乙個list物件，並且list裡每個元素都是乙個tuple物件。

pyhive是通過與hiveserver2通訊來進行hive資料的操作的。而beeline原理也類似（使用的是hive jdbc），所以當pyhive出現連線有問題時，應首先用beeline檢查伺服器連線有沒有問題。

beeline -u jdbc:hive2:

錯誤：

failed: execution error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.mapredtask. cannot initialize cluster. please check your configuration for mapreduce.framework.name and the correspond server addresses.

hive使用beeline配置遠端連線

hive以hadoop集群為基礎，提供hdfs的sql支援介紹hive的遠端訪問未配置之前使用beeline的話，每次都要為連線輸入使用者名稱密碼，較為麻煩實現目標在非集群節點上敲beeline命令，直接進入到hive的命令列 1，在hive服務的安裝節點的hive site.xml配置檔案...

使用python連線hive（親測有用）

本來想將hive中的資料用python進行視覺化，在連線時出現了許多問題，特此記錄一下 pip install sasl pip install thrift pip install thrift sasl pip install pyhive 在linux端啟動hadoop並在任意目錄下輸入 hi...

python連線hive配置

注 python端所在伺服器為centos6.8 參考博文重要一點，hive是個客戶端，不是集群，在hive所在節點必須執行如下命令 hive service hiveserver2 參考博文注客戶端指的是裝有anaconda的python環境的linux機器，並非裝有hive的機器以下涉及...

使用PyHive連線Hive資料倉儲

hive使用beeline配置遠端連線

使用python連線hive（親測有用 ）

python連線hive配置

相關推薦

使用python連線hive（親測有用）