question:用spark對資料進行排序,首先按照顏值的從高到低進行排序,如果顏值相等,在根據年齡的公升序排序
1.user類繼承ordered,並且序列化
package cn.edu360.spark.day06
import org.apache.log4j.
import org.apache.spark.rdd.rdd
import org.apache.spark.
* 自定義排序
* created by zhangjingcun on 2018/9/27 17:13.
object customsort1 else
import org.apache.spark.
import org.apache.spark.rdd.rdd
* created by zhangjingcun on 2018/9/27 17:32.
object customsort2 else
import org.apache.spark.
import org.apache.spark.rdd.rdd
* created by zhangjingcun on 2018/9/27 17:37.
object customsort3 else
import org.apache.spark.
import org.apache.spark.rdd.rdd
* created by zhangjingcun on 2018/9/27 17:41.
object customsort4 {
def main(args: array[string]): unit = {
logger.getlogger("org.apache.spark").setlevel(level.off)
val conf = new sparkconf().setappname("iplocation").setmaster("local[*]")
val sc = new sparkcontext(conf)
//用spark對資料進行排序,首先按照顏值的從高到低進行排序,如果顏值相等,在根據年齡的公升序排序
val users: array[string] = array("1,tom,99,34", "2,marry,96,26", "3,mike,98,29", "4,jim,96,30")
//並行化成rdd
val userlines: rdd[string] = sc.makerdd(users)
//整理資料
val tprdd: rdd[(long, string, int, int)] = userlines.map(line => {
val fileds = line.split(",")
val id = fileds(0).tolong
val name = fileds(1)
val fv = fileds(2).toint
val age = fileds(3).toint
(id, name, fv, age)
//利用元祖的比較特點:先比較第乙個,如果不相等,按照第乙個屬性排序,在比較下個屬性
implicit val rules = ordering[(int, int)].on[(long, string, int, int)](t => (-t._3, t._4))
val sorted = tprdd.sortby(t => t)
//收集資料
val result: array[(long, string, int, int)] = sorted.collect()
println(result.tobuffer)
sc.stop()
Spark實現排序
question 用spark對資料進行排序,首先按照顏值的從高到低進行排序,如果顏值相等,在根據年齡的公升序排序 1.user類繼承ordered,並且序列化 package cn.edu360.spark.day06 import org.apache.log4j.import org.apac...
spark的TimSort排序演算法實現
spark版本2.4.0。spark中的排序實現也是通過timsort類實現,實現具體方式與jdk略有區別。具體實現,在timsort類的sort 方法的sort 方法中。if nremaining min merge 當被排序的陣列長度小於32時,具體的排序流程分為兩步,首先通過countruna...
Spark實現wordCount(Scala版本)
廢話不多說直接上 初始化 val sc new sparkcontext conf val list sc.makerdd list lisa jennie ros jisoo black pink jisoo jennie lisa ros 這裡和scala寫差不多 都是先flatmap根據分隔符...