模型原型
sklearn.neighbors.kneighborsclassifier(n_neighbors=5,weights=』uniform』,algorithm=』auto』,leaf_size=30,p=2, metric=』minkowski』,metric_params=none,n_jobs=1,**kwargs)
引數
algorithm:指定計算最近鄰的演算法
leaf_size:指定balltree,kdtree葉節點規模
p:指定在』minkowski』度量上的指數
n_jobs
方法
import numpy as np
import matplotlib.pyplot as plt
from sklearn import neighbors,datasets,cross_validation
載入資料集
def
load_classification_data
(): digits=datasets.load_digits()
x_train=digits.data
y_train=digits.target
return cross_validation.train_test_split(x_train,y_train,
test_size=0.25,random_state=0,stratify=y_train)
使用kneighborsclassifier
def
test_kneighborsclassifier
(*data):
x_train,x_test,y_train,y_test=data
clf=neighbors.kneighborsclassifier()
clf.fit(x_train,y_train)
print('training score:%f'%clf.score(x_train,y_train))
print('testing score:%f'%clf.score(x_test,y_test))
x_train,x_test,y_train,y_test=load_classification_data()
test_kneighborsclassifier(x_train,x_test,y_train,y_test)
k值以及投票策略的影響
def
test_kneighborsclassifier_k_w
(*data):
x_train,x_test,y_train,y_test=data
ks=np.linspace(1,y_train.size,num=100,endpoint=false,
dtype='int')
weights=['uniform','distance']
#繪圖fig=plt.figure()
ax=fig.add_subplot(1,1,1)
for weight in weights:
training_scores=
testing_scores=
for k in ks:
clf=neighbors.kneighborsclassifier(weights=weight,
n_neighbors=k)
clf.fit(x_train,y_train)
ax.plot(ks,testing_scores,label='testing score:weight=%s'%weight)
ax.plot(ks,training_scores,label='training score:weight=%s'%weight)
ax.legend(loc='best')
ax.set_xlabel('k')
ax.set_ylabel('score')
ax.set_ylim(0,1.05)
ax.set_title('kneighborsclassifier')
plt.show()
`x_train,x_test,y_train,y_test=load_classification_data()
test_kneighborsclassifier_k_w(x_train,x_test,y_train,y_test)
p值的影響
def
test_kneighborsclassifier_k_p
(*data):
x_train,x_test,y_train,y_test=data
ks=np.linspace(1,y_train.size,endpoint=false,dtype='int')
ps=[1,2,10]
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
for p in ps:
training_scores=
testing_scores=
for k in ks:
clf=neighbors.kneighborsclassifier(p=p,n_neighbors=k)
clf.fit(x_train,y_train)
ax.plot(ks,testing_scores,label='testing score:p=%d'%p)
ax.plot(ks,training_scores,label='training score:p=%d'%p)
ax.legend(loc='best')
ax.set_xlabel("k")
ax.set_ylabel('score')
ax.set_ylim(0,1.05)
ax.set_title('kneighborsclassifier')
plt.show()
x_train,x_test,y_train,y_test=load_classification_data()
test_kneighborsclassifier_k_p(x_train,x_test,y_train,y_test)
k近鄰分類 kNN
k近鄰分類 knn 一 knn原理 knn屬於監督分類方法,原理是利用某種距離度量方式來計算未知資料與已知資料的距離,並根據距離來確定資料光譜間的相似性,選取最近的k個距離作為判定未知資料類別的依據。在分類時,knn常用方法有 投票法,根據k個距離對應已知資料的類別進行統計,把出現次數最多的類別作為...
K近鄰(KNN) 分類演算法
k近鄰 knn 分類演算法 knn是non parametric分類器 不做分布形式的假設,直接從資料估計概率密度 是memory based learning.knn不適用於高維資料 curse of dimension machine learning的python庫很多,比如mlpy 更多pa...
k近鄰法 電影分類
import math movie data 測試樣本 唐人街探案 23,3,17,片 下面為求與資料集中所有資料的距離 x 23,3,17 knn for key,v in movie data.items d math.sqrt x 0 v 0 2 x 1 v 1 2 x 2 v 2 2 輸出所...