一、概述
knn演算法採用測量不同特徵值之間的距離方法進行分類。對未知類別屬性的資料集中的每個點執行如下操作:
(1)計算已知類別資料集中的點與當前點之間的距離;
(2)按照距離遞增次序排序;
(3)選取與當前點距離最小的k個點;
(4)確定前k個點所在類別的出現頻率;
(5)返回前k個點出現頻率最高的類別作為當前點的**分類。
二、**實現
1.基於scikit-learn包實現
import numpy as np
from sklearn import neighbors
def split_data(data, test_size):
data_num = data.shape[0]
train_ind = list(range(data_num))
test_ind =
test_num = int(data_num * test_size)
for i in range(test_num):
rand_ind = np.random.randint(0, len(train_ind))
del train_ind[rand_ind]
train_data = data[train_ind]
test_data = data[test_ind]
return train_data, test_data
# load the data and divide the data
mydata = np.loadtxt(open("iris.txt","rb"), delimiter = ",", skiprows = 0)
train_data, test_data = split_data(mydata, 0.3)
n = mydata.shape[1]
test_label = test_data[:, n-1]
test_data = test_data[:,0:n-1]
train_label = train_data[:,n-1]
train_data = train_data[:, 0:n-1]
# get the knn classifier
knn = neighbors.kneighborsclassifier()
knn.fit(train_data, train_label)
print(knn.predict(test_data))
執行結果如下:
2、python**逐步實現
import numpy as np
import operator
def split_data(data, test_size):
data_num = data.shape[0]
train_ind = list(range(data_num))
test_ind =
test_num = int(data_num * test_size)
for i in range(test_num):
rand_ind = np.random.randint(0, len(train_ind))
del train_ind[rand_ind]
train_data = data[train_ind]
test_data = data[test_ind]
return train_data, test_data
def createdataset():
group = np.array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])
# labels = ['a', 'a', 'b', 'b']
labels = np.array([1, 1, 2, 2])
# print (group)
# print (labels)
return group, labels
def classify0(inx, dataset, labels, k):
datasetsize = dataset.shape[0]
diffmat = np.tile(inx, (datasetsize, 1)) - dataset
sqdiffmat = diffmat ** 2
sqdistances = sqdiffmat.sum(axis=1)
distances = sqdistances ** 0.5
sorteddistindicies = distances.argsort()
classcount = {}
for i in range(k):
voteilabel = labels[sorteddistindicies[i]]
classcount[voteilabel] = classcount.get(voteilabel, 0) + 1
sortedclasscount = sorted(classcount.items(), key = operator.itemgetter(1), reverse = true)
return sortedclasscount[0][0]
def classify1(inx, dataset, labels, k):
result_ind =
inx_size = inx.shape[0]
datasetsize = dataset.shape[0]
for i in range(inx_size):
diffmat = np.tile(inx[i,:], (datasetsize, 1)) - dataset
sqdiffmat = diffmat ** 2
sqdistances = sqdiffmat.sum(axis=1)
distances = sqdistances ** 0.5
sorteddistindicies = distances.argsort()
classcount = {}
for j in range(k):
voteilabel = labels[sorteddistindicies[j]]
classcount[voteilabel] = classcount.get(voteilabel, 0) + 1
sortedclasscount = sorted(classcount.items(), key=operator.itemgetter(1), reverse=true)
ind = sortedclasscount[0][0]
return result_ind
# load the data and divide the data
mydata = np.loadtxt(open("iris.txt","rb"), delimiter = ",", skiprows = 0)
train_data, test_data = split_data(mydata, 0.3)
n = mydata.shape[1]
test_label = test_data[:, n-1]
test_data = test_data[:,0:n-1]
train_label = train_data[:,n-1]
train_data = train_data[:, 0:n-1]
# test code -- classify 0
result_ind =
for i in range(len(test_data)):
ind = classify0(test_data[i,:], train_data, train_label,7)
print(result_ind)##
# # test code -- classify 1
# result_index = classify1(test_data, train_data, train_label, 3)
# print(result_index)
執行結果如下: kNN分類演算法
knn k nearest neighbors 又叫k最鄰近演算法,是一種根據待分類的樣本與已知類別的樣本間的距離得到其分類結果的分類演算法。計算待分類樣本與所有已知類別樣本的距離值 從這些距離值中選取最小的k個 根據這k個樣本的類別情況,確定待分類的分類結果 距離的計算 這裡的距離其實是一種相似度...
kNN分類演算法
一 演算法實施過程 1 計算已知類別資料集中的點與當前點之間的距離 2 按照距離遞增次序排序 3 選取與當前點距離最小的k個點 4 確定前k個點所在類別的出現頻率 5 返回前k個點出現頻率最高的類別作為當前點的 分類。二 python 實現 from numpy import import oper...
KNN分類演算法
簡單來說,如下圖所示 這個綠色的球是什麼顏色,就是說,離他最近的3個點 那就是k 3 是什麼顏色。2 3是紅色。如果是k 5呢?那就是藍色。這就是knn演算法。一種很好理解的分類概率模型。在knn中,通過計算物件間距離來作為各個物件之間的非相似性指標,避免了物件之間的匹配問題,在這裡距離一般使用歐氏...