(1) 獲取filesystem
(2) 根據時間戳獲取最新目錄//1. 生成filesystem
def gethdfs
(path: string)
: filesystem =
(3) 讀取最新目錄下全部有效資料檔案def findcandidate
(filesystem: filesystem, fspath: string)
: path =
spark.read.text(finalpath):將檔案讀取為dataframe
(4) 解析檔案中按行訪問的json,解析後儲存到新的dataframe中//獲取最新目錄
val validpath =
findcandidate
(gethdfs
(path)
, path)
println
("validfilepath: "
+ validpath)
val finalpath = validpath.tostring.
concat
("/part-*"
)println
("finalpath: "
+ finalpath)
val result = spark.read.
text
(finalpath)
val list = result.
collect()
for(row <
- list)
val features = json.
getjsonarray
("feature"
).toarray
val imgid = json.
getstring
("img_id"
) val imgurl = json.
getstring
("img_url"
) val width = json.
getintvalue
("width"
) val height = json.
getintvalue
("height"
) val date = json.
getstring
("date"
) val isimg = json.
getstring
("type"
) val extention = json.
getstring
("extention"
) val path = json.
getstring
("path"
) val source = json.
getstring
("source"
) datalist.
add(
row(adidslist.toarray, features, imgid, imgurl, width, height, date, isimg, extention, path, source)
)}
其中,datalist需要事先定義好row的scheme,如下所示
(5) 根據datalist建立新的dataframeval schema =
structtype
(list
(structfield
("s_ad_id"
,arraytype
(longtype,
true),
true),
structfield
("feature"
,arraytype
(stringtype,
true),
true),
structfield
("img_id"
, stringtype,
true),
structfield
("img_url"
, stringtype,
true),
structfield
("width"
, integertype,
true),
structfield
("height"
, integertype,
true),
structfield
("date"
, stringtype,
true),
structfield
("format_type"
, stringtype,
true),
structfield
("extention"
, stringtype,
true),
structfield
("path"
, stringtype,
true),
structfield
("source"
, stringtype,
true))
) val datalist =
newutil.arraylist
[row]
()
ps:未完待續var df2 = spark.
createdataframe
(datalist, schema)
spark scala 常用函式
將多個字串連線成乙個字串並用分隔符隔開 key相同的元素的value進行binary function的合併操作,如若括號內為 x,y x y則表示對key相同元素value求和 用來丟棄指定列 類似於subtrac,刪掉 rdd 中鍵與 other rdd 中的鍵相同的元素 表一.join 表二,...
ReportingSerivces 常用技巧
解決重複線問題 dim name as string public function findline byval value as string as string if name value then return false else name value return true end if...
Delphi ListView的用法 常用技巧
delphi listview的用法 常用技巧 2008 02 03 11 37 增加 i listview1.items.count with listview1 do begin listitem items.add listitem.caption inttostr i listitem.su...