一、在看這個例子之前你需要:
1)稍稍懂一些scala的語法
2)本地機器上有spark環境,最好安裝了hadoop
二、乙個簡單的lr分類模型
步驟1:處理資料成為labeledpoint格式,參考:spark官網ml資料格式;乙個簡單明瞭的spark資料處理網上書籍
步驟2:呼叫spark工具包執行演算法,參考:spark官網邏輯回歸實現
以下演示環境為spark-shell
scala> sc//spark-shell會預設建立乙個sc變數,即sparkcontext例項
res0: org.apache.spark.sparkcontext = org.apache.spark.sparkcontext@b5de9ac
//讀取資料
scala> val rdd1 = sc.textfile("hdfs://bipcluster/user/platform_user/jiping.liu/dataspark.csv")
scala> rdd1.first()//spark 是惰性計算,只有遇到像first()這樣的行動函式後才會執行計算,有點行tensorflow,//第乙個0表示label,之後表示features index:value的libsvm資料格式
res1: string = 0 0:0.14447325 1:24.5 2:184.433 3:291.9 4:0.0382946 5:8.142114 6:2.8 7:65.86893....
//資料處理
scala> :paste//成段編寫spark-shell指令碼的命令
// entering paste mode (ctrl-d to finish)
val datapoint = rdd1.map(line =>
//exiting paste mode, now interpreting.
scala> datapoint.first()
res2: org.apache.spark.mllib.regression.labeledpoint =
(0.0,(5000,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,55,56,57,58,59,60,61....
//模型匯入
scala> import org.apache.spark.mllib.classification.
scala> import org.apache.spark.mllib.evaluation.multiclassmetrics
//資料集分割成train和test
scala> val splits = datapoint.randomsplit(array(0.6,0.4),seed = 11l)
scala> val train = splits(0)
scala> train.first()
res4: org.apache.spark.mllib.regression.labeledpoint = (0.0,(5000,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,55,56,57,58,59,60,61,62,63,69,...
//模型訓練
scala> val model = new logisticregressionwithlbfgs().setnumclasses(2).run(train)
18/06/29 19:23:08 warn [com.github.fommil.netlib.blas(61) -- main]: failed to load implementation from: com.github.fommil.netlib.nativesystemblas
18/06/29 19:23:08 warn [com.github.fommil.netlib.blas(61) -- main]: failed to load implementation from: com.github.fommil.netlib.nativerefblas
model: org.apache.spark.mllib.classification.logisticregressionmodel = org.apache.spark.mllib.classification.logisticregressionmodel: intercept = 0.0, numfeatures = 5000, numclasses = 2, threshold = 0.5
//模型測試評估
scala> :paste
// entering paste mode (ctrl-d to finish)
val preandtrue = test.map
// exiting paste mode, now interpreting.
scala> val metrics = new multiclassmetrics(preandtrue)
metrics: org.apache.spark.mllib.evaluation.multiclassmetrics = org.apache.spark.mllib.evaluation.multiclassmetrics@689f9dc8
scala> preandtrue
scala> preandtrue.first
def first(): (double, double)
scala> preandtrue.first()
res6: (double, double) = (0.0,0.0)
scala> val accuracy = metrics.accuracy
accuracy: double = 0.885496183206106
乙個簡單css例子
lang en charset utf 8 css講解title rel stylesheet href style.css body div dd xddaa hover abc ulli first child ulli last child ulli nth child 3 ulli only...
乙個簡單的json例子
名稱 年齡郵箱 response.setcontenttype text html charset utf 8 response.setheader cache control no cache jsonobject json new jsonobject try json.put jobs mem...
乙個poll的簡單例子
該程式使用poll事件機制實現了乙個簡單的訊息回顯的功能,其伺服器端和客戶端的 如下所示 伺服器端 start from the very beginning,and to create greatness author chuangwei lin e mail 979951191 qq.com b...