乙個簡單的例子開啟Spark機器學習

2021-08-21 03:12:15 字數 2988 閱讀 4819

一、在看這個例子之前你需要:

1)稍稍懂一些scala的語法

2)本地機器上有spark環境,最好安裝了hadoop

二、乙個簡單的lr分類模型

步驟1:處理資料成為labeledpoint格式,參考:spark官網ml資料格式;乙個簡單明瞭的spark資料處理網上書籍

步驟2:呼叫spark工具包執行演算法,參考:spark官網邏輯回歸實現

以下演示環境為spark-shell

scala> sc//spark-shell會預設建立乙個sc變數,即sparkcontext例項 

res0: org.apache.spark.sparkcontext = org.apache.spark.sparkcontext@b5de9ac

//讀取資料

scala> val rdd1 = sc.textfile("hdfs://bipcluster/user/platform_user/jiping.liu/dataspark.csv")

scala> rdd1.first()//spark 是惰性計算,只有遇到像first()這樣的行動函式後才會執行計算,有點行tensorflow,

//第乙個0表示label,之後表示features index:value的libsvm資料格式

res1: string = 0 0:0.14447325 1:24.5 2:184.433 3:291.9 4:0.0382946 5:8.142114 6:2.8 7:65.86893....

//資料處理

scala> :paste//成段編寫spark-shell指令碼的命令

// entering paste mode (ctrl-d to finish)

val datapoint = rdd1.map(line =>

//exiting paste mode, now interpreting.

scala> datapoint.first()

res2: org.apache.spark.mllib.regression.labeledpoint =

(0.0,(5000,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,55,56,57,58,59,60,61....

//模型匯入

scala> import org.apache.spark.mllib.classification.

scala> import org.apache.spark.mllib.evaluation.multiclassmetrics

//資料集分割成train和test

scala> val splits = datapoint.randomsplit(array(0.6,0.4),seed = 11l)

scala> val train = splits(0)

scala> train.first()

res4: org.apache.spark.mllib.regression.labeledpoint = (0.0,(5000,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,55,56,57,58,59,60,61,62,63,69,...

//模型訓練

scala> val model = new logisticregressionwithlbfgs().setnumclasses(2).run(train)

18/06/29 19:23:08 warn [com.github.fommil.netlib.blas(61) -- main]: failed to load implementation from: com.github.fommil.netlib.nativesystemblas

18/06/29 19:23:08 warn [com.github.fommil.netlib.blas(61) -- main]: failed to load implementation from: com.github.fommil.netlib.nativerefblas

model: org.apache.spark.mllib.classification.logisticregressionmodel = org.apache.spark.mllib.classification.logisticregressionmodel: intercept = 0.0, numfeatures = 5000, numclasses = 2, threshold = 0.5

//模型測試評估

scala> :paste

// entering paste mode (ctrl-d to finish)

val preandtrue = test.map

// exiting paste mode, now interpreting.

scala> val metrics = new multiclassmetrics(preandtrue)

metrics: org.apache.spark.mllib.evaluation.multiclassmetrics = org.apache.spark.mllib.evaluation.multiclassmetrics@689f9dc8

scala> preandtrue

scala> preandtrue.first

def first(): (double, double)

scala> preandtrue.first()

res6: (double, double) = (0.0,0.0)

scala> val accuracy = metrics.accuracy

accuracy: double = 0.885496183206106

乙個簡單css例子

lang en charset utf 8 css講解title rel stylesheet href style.css body div dd xddaa hover abc ulli first child ulli last child ulli nth child 3 ulli only...

乙個簡單的json例子

名稱 年齡郵箱 response.setcontenttype text html charset utf 8 response.setheader cache control no cache jsonobject json new jsonobject try json.put jobs mem...

乙個poll的簡單例子

該程式使用poll事件機制實現了乙個簡單的訊息回顯的功能,其伺服器端和客戶端的 如下所示 伺服器端 start from the very beginning,and to create greatness author chuangwei lin e mail 979951191 qq.com b...