scikit learn的機器學習演算法封裝

importnumpyasnp
frommathimportcounter
fromcollectionsimportcounter
#定義了k， x_train, y_train作為訓練集，之後x是我們要**，返回對應的x的y的值
defknn_classify(k, x_train, y_train, x):
assert1 
<= k <= x_train.shape[0],"k must be valid"assertx_train.shape[0] == y_train.shape[0],\
"the size of x_train must equal to the size of y_train"assertx_train.shape[1] == x.shape[0],\
"the featuer number of x must be equal to x_train"# 演算法過程只有5行
distances = [sqrt(np.sum((x_train - x)**2))forx_rain 
inx_train]
nearest = np.argsort(distances)
topk_y = [y_train[i]foriinnearest[:k]]
votes = counter(topk_y)
returnvotes.most_common(1)[0][0]

我們寫好這個函式

在jn裡傳入引數

%run knn_function/knn.py

predict_y = knn_classify(6,x_train,y_train,x)

就可以得到結果

對於我們knn演算法我們並沒有得到模型，確實如此

這是knn演算法的乙個特性，

可以說knn演算法是機器學習演算法唯一乙個不需要訓練過程的演算法

k近鄰演算法非常特殊，可以被認為是沒有模型的演算法

為了統一和其他演算法，可以認為訓練資料集就是模型本身

相對複雜的是**的過程

scikit-learn中的knn

from sklearn.neighbors import kneighborsclassifier #包裝了我們knn演算法在sl的機器學習演算法都是物件導向包裝的

knn_classifier = kneighborsclassifier(n_neighbors=6) #引數就是k

knn_classifier.fit(x_train,y_train) #做一遍fit（擬合）

上面這句話我們發現有返回值，返回值是機器學習物件自身，不需要接受

knn_classifier.predict(x) #就可以**x了

array([1]) #這是輸出，發現結果是1

不過過程中出現了警告，告訴我們傳入了一維陣列為資料，這種方式將被拋棄，並且丟擲valueerror的異常

sl需要我們傳入矩陣。

比如說一次要**十個資料，將這10個資料寫成乙個矩陣傳給predict，而不要將這十個資料乙個乙個傳給predict函式

如果我們要**的資料只有乙個，也要將乙個資料整理成矩陣

x_predict = x.reshape(1,-1)

第乙個維度是1，因為我們只有乙個資料

第二個維度是-1，讓np來自動決定第二個維度有多少

就得到1*2的矩陣

就是2個

把knn_classifier.predict(x_predict)

就不會報錯了

我們可以傳進y_predict儲存

y_predict = knn_classifier.predict(x_predict)

y_predict[0]就是結果

sl的過程

1.載入sl的相應機器學習演算法

2.然後建立演算法所對應的例項，需要引數傳入

3.fit擬合我們的訓練資料集

4.就可以pridict過程了

所有機器學習演算法都是這樣執行的

import numpy as np

from math import sqrt

from collections import counter

from .metrics import accuracy_score

class knnclassifier:

def __init__(self, k):

"""初始化knn分類器"""

assert k >= 1, "k must be valid"

self.k = k

self._x_train = none

self._y_train = none

def fit(self, x_train, y_train):

"""根據訓練資料集x_train和y_train訓練knn分類器"""

assert x_train.shape[0] == y_train.shape[0], \

"the size of x_train must be equal to the size of y_train"

assert self.k <= x_train.shape[0], \

"the size of x_train must be at least k."

self._x_train = x_train

self._y_train = y_train

return self

def predict(self, x_predict):

"""給定待**資料集x_predict，返回表示x_predict的結果向量"""

assert self._x_train is not none and self._y_train is not none, \

"must fit before predict!"

assert x_predict.shape[1] == self._x_train.shape[1], \

"the feature number of x_predict must be equal to x_train"

y_predict = [self._predict(x) for x in x_predict]

return np.array(y_predict)

def _predict(self, x):

"""給定單個待**資料x，返回x的**結果值"""

assert x.shape[0] == self._x_train.shape[1], \

"the feature number of x must be equal to x_train"

distances = [sqrt(np.sum((x_train - x) ** 2))

for x_train in self._x_train]

nearest = np.argsort(distances)

topk_y = [self._y_train[i] for i in nearest[:self.k]]

votes = counter(topk_y)

return votes.most_common(1)[0][0]

def __repr__(self):

return "knn(k=%d)" % self.k

scikit learn機器學習簡介

目錄機器學習問題設定載入示例資料集學習和模型永續性約定拓展知識鏈結在本節中，我們介紹整個scikit learn中使用的機器學習詞彙，並給出了乙個簡單的學習示例。機器學習問題設定通常，學習問題會考慮一組n個資料樣本，然後嘗試未知資料的屬性。如果每個樣本都大於乙個數字，例如是多維條...

機器學習及scikit learn

1.機器學習基本步驟 1 定義一系列函式 2 定義函式的優劣 3 選擇最優函式 2.什麼是scikit learn？1 面向python的免費機器學習庫 2 包含分類回歸聚類演算法，比如 svm 隨機森林 k means等 3 包含降維模型選擇預處理等演算法 4 支援numpy和scipy資...

Scikit learn機器學習庫的安裝

scikit learn是python的乙個開源機器學習模組，它建立在numpy matplotlib和scipy模組之上能夠為使用者提供各種機器學習演算法介面，可以讓使用者簡單高效地進行資料探勘和資料分析。numpy 1.11.3 mkl cp27 cp27m win amd64.whl sci...

scikit learn的機器學習演算法封裝

scikit learn機器學習簡介

機器學習及scikit learn

Scikit learn機器學習庫的安裝

相關推薦