hive ,從hdfs把資料檔案load匯入到表

hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');

原資料檔案（已經不存在了，是從原路徑移動到了新路徑下）：

如果從本地匯入後本地的原資料檔案依然存在，相當於複製過去；如果是從hdfs匯入，則原資料檔案不存在，相當於剪下過去。

如果目的分割槽之前已存在，會把目的分割槽的之前的檔案移動到**站；

如果目的分割槽之前不存在，則在移動檔案後，會自動把新分割槽加好，不用再手動新增新分割槽。

建表語句：

create table `sales_info`( `sku_id` string comment '商品id', `sku_name` string comment '商品名稱', `category_id3` string comment '**分類id', `price` double comment '銷售**', `sales_count` bigint comment '銷售數量' )comment '商品銷售資訊表' partitioned by( `dt` string) row format delimited fields terminated by ',' null defined as '' stored as textfile location 'hdfs://ns1/abc/sales_info'

資料內容：

[abc]$ cat sales_info.txt 
12377,華為mate10,31,999,20
45677,華為mate30,31,2999,30
[abc]$

在hdfs新建資料夾(hello)，把本地檔案put到hdfs目的路徑中：

hive> dfs -mkdir hdfs://ns1/abc/sales_info/hello;
hive> dfs -put sales_info.txt hdfs://ns1/abc/sales_info/hello;
hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
found 1 items
-rw-r--r--   3 a a 61 2019-04-27 17:34

（put後，原檔案依然存在，相當於複製）

匯入資料（新建表後，之前匯入過一次，這是第二次匯入）、查詢結果（有2條資料，是最新的，之前是5條資料）：

hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');
loading data to table gdm.sales_info partition (dt=2019-04-26)
moved: 'hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt' to trash at: hdfs://ns1/abc/.trash/current
partition gdm.sales_info stats: [numfiles=1, numrows=0, totalsize=61, rawdatasize=0]
oktime taken: 0.43 seconds
hive> select *  from sales_info;
oksku_id	sku_name	category_id3	price	sales_count	dt
12377	華為mate10	31	999.0	20	2019-04-26
45677	華為mate30	31	2999.0	30	2019-04-26
time taken: 0.049 seconds, fetched: 2 row(s)

再檢視原資料檔案（已經不存在了，是從原路徑移動到了新路徑下）：

hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
hive>

例項2：

hive> dfs -du -h hdfs://.../tmp.db/test_1030_external_dt; //無分割槽和檔案
hive> dfs -du -h hdfs://.../tmp.db/test_1030;
19.5 k  hdfs://.../tmp.db/test_1030/000000_0
192     hdfs://.../tmp.db/test_1030/test2.log //load這個檔案
hive> load data inpath 'hdfs://.../tmp.db/test_1030/test2.log' overwrite into table  tmp.test_1030_external_dt partition (dt='2019-11-11') ;
loading data to table tmp.test_1030_external_dt partition (dt=2019-11-11)
partition tmp.test_1030_external_dt stats: [numfiles=1, numrows=0, totalsize=192, rawdatasize=0]
oktime taken: 0.777 seconds
hive> dfs -du -h hdfs://.../tmp.db/test_1030;
19.5 k  hdfs://.../tmp.db/test_1030/000000_0 //有分割槽和檔案
hive> dfs -du -h hdfs://.../tmp.db/test_1030_external_dt;
192  hdfs://.../tmp.db/test_1030_external_dt/dt=2019-11-11 //原檔案已經不存在了,說明是移動
hive> dfs -ls  hdfs://.../tmp.db/test_1030_external_dt/dt=2019-11-11;
found 1 items
-rwxr-xr-x   3 jdw_traffic jdw_traffic        192 2019-11-14 19:12 hdfs://.../tmp.db/test_1030_external_dt/dt=2019-11-11/test2.log //檔案的時間沒有變化
				HIVE 載入本地資料檔案報錯
問題出現背景 從本地資料檔案載入資料表 資料檔案 qqq,aaa ccc,rrrr 建表語句 create external table dwh tag convert dict obj string,tag string partitioned by p type string row forma...
				如何快速把hdfs資料動態匯入到hive表
create external table if not exists sensitop.equd json tmp retcode string,retmsg string,data array row format serde org.apache.hive.hcatalog.data.json...
				Hive從S3中對映資料檔案以及對映分割槽資料
之前也沒接觸過aws對之不是很熟悉，但最近有需求需要在aws的emr中，用hive去獲取s3桶 或者指定桶內資料夾 內的資料，這裡記錄一下。emr，安裝hadoop集群，hive 1.直接對映資料 直接將s3中的資料檔案對映到hive表，不做任何處理。這個時候直接在hive中建立表的時候指定s3資料...

hive ,從hdfs把資料檔案load匯入到表

HIVE 載入本地資料檔案報錯

如何快速把hdfs資料動態匯入到hive表

Hive從S3中對映資料檔案以及對映分割槽資料

相關推薦