Sqoop匯入HDFS格式問題

(1)使用text格式

sqoop import --connect jdbc:oracle:thin:@//ip:1521/asmp2 --username --password --query "select * from sbpopt.tt_maintenance_times_correct where \$conditions" --fields-terminated-by '\t' --delete-target-dir --target-dir /user/asmp/hive/asmp/tt_maintenance_times_correct -m 1

匯入到hdfs結果如下：

原本oracle是null的字段，被轉成字串"null"，結果我在sql使用

nvl(c.business_correct_times,c.sys_definition_times) 才發現針對 null 無效。。。

(2)只能換成parquet格式

sqoop import --connect jdbc:oracle:thin:@//ip:1521/asmp2 --username --password --query "select * from sbpopt.tt_maintenance_times_correct where \$conditions" --as-parquetfile --delete-target-dir --target-dir /user/asmp/hive/asmp/tt_maintenance_times_correct -m 1

匯入到hdfs結果如下：

hive建表語句：（字段型別必須和sqoop匯出的檔案保持一致）

drop table asmp.tt_maintenance_times_correct; create external table ifnot exists asmp.tt_maintenance_times_correct (id string, product_code string, product_name string, first_billing_date bigint ,last_billing_date bigint ,sale_amount string, sys_definition_times string, business_correct_times string, correct_status string, correct_date bigint ,correct_people string, create_by string, create_date bigint ,update_by string, update_date bigint )comment 'asmp臨時表' stored as parquet location '/user/asmp/hive/asmp/tt_maintenance_times_correct'

;

如果oracle表中字段中會有換行符，會導致資料存入hive後，條數增多（每個換行符前後拆分成兩行），所以需要特殊字元處理，方法如下：

#對換行等特殊字元的替換成" " --hive-delims-replacement " " #對換行等特殊字元刪除 --hive-drop-import-delims

Sqoop匯入HDFS格式問題

Sqoop匯入資料到HDFS上

sqoop匯入匯出

sqoop 增量匯入

Sqoop匯入HDFS格式問題

Sqoop匯入資料到HDFS上

sqoop匯入匯出

sqoop 增量匯入

相關推薦