對於knn來說,訓練集就是模型
機器學習的流程:
訓練集 -> 擬合(fit)-> 模型 -> **(predict)
# 引入庫,名字挺長的,不好記
from sklearn.neighbors import kneighborsclassifier
import numpy as np
import matplotlib.pyplot as plt
# row data 是python中普通的list
row_data_x =[[
3.3935
,2.3312],
[3.1101
,1.7815],
[1.3438
,3.3684],
[3.5823
,4.6792],
[2.2804
,2.8670],
[7.4234
,4.6965],
[5.7451
,3.5340],
[9.1722
,2.5111],
[7.7928
,3.4241],
[7.9398
,0.7916]]
# 1:良性腫瘤,0:惡性腫瘤
row_data_y =[0
,0,0
,0,0
,1,1
,1,1
,1]# 將資料轉化為np
x_train = np.array(row_data_x)
y_train = np.array(row_data_y)
x = np.array(
[8.0936
,3.3657])
# 給定待測點
knn_classifier = kneighborsclassifier(n_neighbors=6)
knn_classifier.fit(x_train,y_train)
knn_classifier.predict(x.reshape(1,
-1))
# 此處強制要求是二維陣列
array([1])
# 封裝上一節的程式
import numpy as np
from collections import counter
from math import sqrt
class
my_knn_classifier
:def
__init__
(self,k)
:"""初始化knn分類器"""
assert k>=1,
"k must be valid"
self.k = k
self.x_train =
none
self.y_train =
none
deffit
(self, x_train, y_train)
:"""train the classifier with x_train and y_train """
assert x_train.shape[0]
== y_train.shape[0]
,"the size of x_train must be equal to the size of y_train"
assert self.k<= x_train.shape[0]
,"the size of x_train must be at least k."
self._x_train = x_train
self._y_train = y_train
return self
defpredict
(self, x_predict)
:"""predict the data set x_predict, return the result of pridicting"""
assert self._x_train is
notnone
and self._y_train is
notnone
, \ "must be fit before prediction!"
assert x_predict.shape[1]
== self._x_train.shape[1]
,\ "the feature of x_predict musst be equal to x_train"
y_predict =
[ self._predict(x)
for x in x_predict ]
return np.array(y_predict)
def_predict
(self, x)
:"""predict the x """
distances =
[sqrt(np.
sum(
(x_train - x)**2
))for x_train in self._x_train ]
nearests = np.argsort(distances)
top_k =
[ self._y_train[i]
for i in nearests[
:self.k]
] votes = counter(top_k)
return votes.most_common(1)
[0][
0]
import numpy as np
from collections import counter
from math import sqrt
x_predict = x.reshape(1,
-1)knn = my_knn_classifier(6)
knn.fit(x_train, y_train)
knn.predict(x_predict)
array([1])
亂序化過程中,x和y是分離的,但是又是一一對應的,所以不能將其分開隨機化,會丟失對應關係。
方式一:可以先將x和y合併成乙個矩陣,再對矩陣進行隨機化處理,處理完再拆分開來。
方式二:對所有元素的m個索引進行亂序處理。
這裡採用方式二。
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
# 從skllearn讀取資料
iris = datasets.load_iris(
)x = iris.data # 資料集,橫座標為樣本,縱座標為特徵
y = iris.target # 樣本
# 檢視資料集大小
print
("x: \n"
,x.shape,
'\n y: \n'
, y.shape)
x:
(150, 4)
y: (150,)
# 對x進行重新排序
shuffle_index = np.random.permutation(
len(x)
)# 設定test,train比例
test_radio =
0.2;
test_size =
int(
len(x)
*test_radio)
test_index = shuffle_index[
:test_size]
train_index = shuffle_index[test_size:
]# get train dataset and test dataset
x_train = x[train_index]
y_train = y[train_index]
x_test = x[test_index]
y_test = y[test_index]
機器學習實戰《學習筆記》 KNN
新增編碼方式 coding utf 8 from numpy import import operator 準備資料 defcreatedataset group array 1.0,1.1 1.0,1.0 0,0 0,0.1 labels a a b b return group,labels 使...
學習筆記 機器學習實戰 KNN
knn演算法注釋版,新手小白,有錯誤歡迎指正 環境 python 3.6 knn分類器 def classify inx,dataset,labels,k inx為行向量 datasize dataset.shape 0 求訓練集的行數 diffmat tile inx datasize,1 dat...
機器學習經典演算法筆記 KNN
這裡面涉及到一些演算法實現的包,比如得到的每個點,求距離後怎麼處理的問題。前面求歐氏距離就不贅述了,這裡主要是補充一點求出結果後怎麼處理的問題 nearest np.argsort distances 這裡對每個距離進行排列,得出index 假設k 6的話 topx y train i for i ...