在進行spark**除錯的時候,報錯:
log length: 2195
traceback (most recent call last):
typeerror: can not infer schema for type: during handling of the above exception, another exception occurred:
traceback (most recent call last):
file "330925675.py", line 209, in run()
file "330925675.py", line 199, in run
.createdataframe(input_data, ["doc_id", "doc_title", "doc_attribute","box_map","has_summary_img","catalog_map","content_img_num","index"])
typeerror: not supported type:
spark**如下
def analysis_row2(row):
box_map = {}
has_summary_img = 0
box_map = {}
attr_list, link_list = get_infobox_data(row.json_info_box)
for attr in attr_list:
box_map[attr.key] = attr.value
# 資訊模組字串
box_map_str = json.dumps(box_map, ensure_ascii=false)
has_summary_img = 0
summary_img = get_summary_img(row.json_summary)
if len(summary_img) > 0:
has_summary_img = 1
if summary_img == "err":
has_summary_img = -1
catalog_map = parser_catalog(row.json_catalog)
catalog_map_str = json.dumps(catalog_map, ensure_ascii=false)
content_img_arr = parser_content_img(row.json_content)
content_img_num = len(content_img_arr)
return row.doc_id, row.doc_title, row.doc_attribute,box_map_str,has_summary_img,catalog_map_str,content_img_num,row.index
//注意這裡的row.index
def run():
spark = sparksession \
.builder \
.enablehivesupport() \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.getorcreate()
conf = sparkconf()
date = str(conf.get("spark.biz.date"))
task_type = "doc_detail_stat"
df = spark.sql("""select
id as doc_id,
index ,
stat.doc_title,
doc_attribute,
json_catalog,
json_info_box,
json_summary,
json_content
from
bk.xiaoxu
join (
select
doc_id,
json_catalog,
doc_title,
doc_attribute,
json_info_box,
json_summary,
json_content
from
bk.mds_midas_latest_doc_stats
where
date = 20200828
) as stat on stat.doc_id = id """
)input_data = df.rdd.map(lambda row: analysis_row2(row))
在除錯spark的時候,**執行失敗,報錯。
排查
發現在analysis_row2 函式返回時不能用row.index,應該是改index關鍵字不可以,改名之後this_index,順利執行。
LeetCode 刷題記錄 002 兩數相加
給定兩個非空鍊錶來表示兩個非負整數。位數按照逆序方式儲存,它們的每個節點只儲存單個數字。將兩數相加返回乙個新的鍊錶。你可以假設除了數字 0 之外,這兩個數字都不會以零開頭。示例 輸入 2 4 3 5 6 4 輸出 7 0 8 原因 342 465 807思路 因為返回乙個新的鍊錶,所以先要建立乙個新...
ios crash問題記錄
1.誤將nsmutablearray型別的變數初始化為nsarray,結果對nsmutablearray型別變數進行操作時,crash h檔案如下 inte ce movemecontroller secondlevelviewcontroller property nonatomic,retain...
??? nginx lua問題記錄
問題1 當用http localhost test 訪問時,結果為何迥異?eg1 location test 結果為空,說明執行的是httpechomodule的echo指令,沒有執行httpluamodule的content by lua指令 eg2 location test輸出123 說明執行...