Spark寫Redis Spark資源配置總結

19/10/16 11:22:06 error yarnclusterscheduler: lost executor 28 on **********: container marked as failed: container_********** on host: **********. exit status: 137. diagnostics: container killed on request. exit code is 137 killed by external signal 19/10/16 11:32:59 error yarnclusterscheduler: lost executor 38 on 100.76.80.197: container marked as failed: container_********** on host: **********. exit status: 137. diagnostics: container killed on request. exit code is 137 killed by external signal 19/10/16 11:40:27 error yarnclusterscheduler: lost executor 39 on **********: container marked as failed: container_1567762627991_1638740_01_000343 on host: **********. exit status: 137. diagnostics: container killed on request. exit code is 137 killed by external signal 19/10/16 11:49:29 error yarnclusterscheduler: lost executor 40 on **********: container marked as failed: container********** on host: **********. exit status: 137. diagnostics: container killed on request. exit code is 137 killed by external signal 19/10/16 11:49:29 error tasksetmanager: task 51 in stage 4.0 failed 4 times; aborting job driver stack trace:

org.apache.spark.sparkexception: job aborted due to stage failure: task 51 in stage 4.0 failed 4 times, most recent failure: lost task 51.3 in stage 4.0 (tid 160, **********, executor 40): executor lost failure (executor 40 exited caused by one of the running tasks) reason: container marked as failed: container_********** on host: 100.76.26.136. exit status: 137. diagnostics: container killed on request. exit code is 137 killed by external signal

這種問題，最大的可能就是資料寫入redis的資源配置不合理，資料量太大，超過了redis能承受的。

幾個關鍵的spark資源配置如下：

由報錯資訊可以看出，yarn丟失了executor，極有可能還是因為executor被關閉了，所以還是要檢查一下自己的driver-memory和executor-memory是不是夠大。

使用spark 2.0的scala api，使用jedis客戶端api，dependency如下：

redis.clients jedis 2.9.0

jar

資料寫入redis**如下：

sampledata.repartition(500).foreachpartition(
rows => )
pipe.sync()
})

推薦使用了pipe進行批量插入，批量插入效率與逐條插入效率差異非常大。但是批量插入有個非常大的坑。上面的**中，如果一次性批量插入了整個partition的資料，恰巧單個partition的資料量非常大(超過了redis pipline 的寫入速度或者 timeout)，會導致redis記憶體溢位(或者timeout)，導致服務不可用！

解決方法是在foreachpartition之前，repartition整個dateset，確保每個分割槽的資料不要太大。推薦控制在1k~20k左右。如上，將sampledata分為500個分割槽，每個分割槽10000條，那麼sampledata的總數為500萬左右。但是，如果資料總量太大，單個分割槽過小，會導致分割槽數過大，這樣需要提高driver的記憶體，否則會導致driver記憶體溢位。

。觀察在不同數量executor下，併發寫入redis的qps，直到qps達到乙個可以接受的範圍。

Spark寫Redis Spark資源配置總結

spark寫sql語句範例

使用Python寫spark 示例

Spark（六） Spark計算模型

Spark寫Redis Spark資源配置總結

spark寫sql語句範例

使用Python寫spark 示例

Spark（六） Spark計算模型

相關推薦