可以通過多種方式將資料匯入hive表,.通過外部表匯入,使用者在hive上建external表,建表的同時指定hdfs路徑,在資料拷貝到指定hdf
可以通過多種方式將資料匯入hive表
1.通過外部表匯入使用者在hive上建external表,建表的同時指定hdfs路徑,在資料拷貝到指定hdfs路徑的同時,也同時完成資料插入external表。
例如:編輯檔案test.txt
$ cat test.txt
1 hello
2 world
3 test
4 case
字段之間以'\t'分割
啟動hive:
$ hive
建external表:
hive>create external table mytest(num int, name string)
> comment 'this is a test'
> row format delimited fields terminated by '\t'
> stored as textfile
> location '/data/test';
oktime taken: 0.714 seconds
hive> show tables;
okmytest
partition_test
partition_test_input
test
time taken: 0.07 seconds
hive> desc mytest ;
oknum int
name string
time taken: 0.121 seconds|
資料拷貝到hdfs:
$ hadoop fs -put test.txt /data/test
檢視hive表資料:
hive> select * from mytest;
ok1 hello
2 world
3 test
4 case
time taken: 0.375 seconds
hive>select num from mytest;
total mapreduce jobs = 1
launching job 1 out of 1
total mapreduce cpu time spent: 510 msec
oktime taken: 27.157 seconds
這種方式常常用於當hdfs上有一些歷史資料,而我們需要在這些資料上做一些hive的操作時使用。這種方式避免了資料拷貝開銷
2.從本地匯入資料不在hdfs上,直接從本地匯入hive表
檔案/home/work/test.txt內容同上
建表:hive> create table mytest2(num int, name string)
> comment 'this is a test2'
> row format delimited fields terminated by '\t'
> stored as textfile;
oktime taken: 0.077 seconds
導資料入錶:
hive>load data local inpath '/home/work/test.txt' into table mytest2;
copying data from file:/home/work/test.txt
copying file: file:/home/work/test.txt
loading data to table default.mytest2
oktime taken: 0.24 seconds
檢視資料:
hive> select * from mytest2;
ok1 hello
2 world
3 test
4 case
time taken: 0.11 seconds
這種方式匯入的本地資料可以是乙個檔案,,乙個資料夾或者萬用字元,需要注意的是,如果是資料夾,資料夾內不能包含子目錄,同樣,萬用字元只能通配檔案。
3.從hdfs匯入上述test.txt檔案已經匯入/data/test
則可以使用下述命令直接將資料匯入hive表:
hive> create table mytest3(num int, name string)
> comment "this is a test3"
> row format delimited fields terminated by '\t'
> stored as textfile;
oktime taken: 4.735 seconds
hive>load data inpath '/data/test/test.txt' into table mytest3;
loading data to table default.mytest3
oktime taken: 0.337 seconds
hive>select * from mytest3 ;
ok1 hello
2 world
3 test
4 case
time taken: 0.227 seconds
4. 從其它表匯入資料:hive> create external table mytest4(num int) ;
oktime taken: 0.091 seconds
hive> from mytest3 test3
> insert overwrite table mytest4
> select test3.num where;
total mapreduce jobs = 2
launching job 1 out of 2
number of reduce tasks is set to 0 since there's no reduce operator
starting job = job_201207230024_0002, tracking url = :50030/jobdetails.jsp?jobid=job_201207230024_0002
kill command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job -dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002
2012-07-23 18:59:02,365 stage-1 map = 0%, reduce = 0%
2012-07-23 18:59:08,417 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:09,435 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:10,445 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:11,455 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:12,470 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:13,489 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec
2012-07-23 18:59:14,508 stage-1 map = 100%, reduce = 100%, cumulative cpu 0.62 sec
mapreduce total cumulative cpu time: 620 msec
ended job = job_201207230024_0002
ended job = -174856900, job is filtered out (removed at runtime).
moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000
loading data to table default.mytest4
deleted hdfs://localhost:9000/user/hive/warehouse/mytest4
table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]
1 rows loaded to mytest4
mapreduce jobs launched:
job 0: map: 1 accumulative cpu: 0.62 sec hdfs read: 242 hdfs write: 2 sucess
total mapreduce cpu time spent: 620 msec
oktime taken: 30.663 seconds
hive> select * from mytest4;
oktime taken: 0.103 seconds
HIVE資料匯入
1.text資料檔案匯出text資料表中 資料格式 建立相應的資料表 create table if not exists text table id int,count int comment table desc partitioned by date int row format delimi...
Hive資料匯入
1.操作準備資料來源 drop table if exists b create table b as select id,name,tel,age from b 2.複製檔案 如果資料檔案恰好是使用者需要的格式,那麼只需要複製檔案或資料夾就可以 hadoop fs cp source path t...
Hive 匯入匯出資料
將檔案中的資料載入到表中 load data local inpath examples files kv1.txt overwrite into table pokes 載入本地資料,同時給定分割槽資訊 load data local inpath examples files kv2.txt o...