機器學習 KNN演算法（資料集的拆分）

注意：不能簡單將x的前n個資料集作為訓練資料集，因為目標值y是排好序的，如[0, 0, ..., 1, 1, ..., 2, 2, ...]，只能取到一定值。

方法：先對原始資料進行亂序化處理，再取前n個作為訓練資料集。

亂序化過程中，x和y是分離的，但是又是一一對應的，所以不能將其分開隨機化，會丟失對應關係。

方式一：可以先將x和y合併成乙個矩陣，再對矩陣進行隨機化處理，處理完再拆分開來。

方式二：對所有元素的m個索引進行亂序處理。

這裡採用方式二。

permutation()函式：

permutation(x)：randomly permute a sequence, or return a permuted range.
if `x` is a multi-dimensional array, it is only shuffled along its
first index.
parameters
----------
x : int or array_like
if `x` is an integer, randomly permute ``np.arange(x)``.
if `x` is an array, make a copy and shuffle the elements
randomly.
returns
-------
out : ndarray
permuted sequence or array range.
examples
--------
>>> np.random.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])
>>> np.random.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12])
>>> arr = np.arange(9).reshape((3, 3))
>>> np.random.permutation(arr)
array([[6, 7, 8],
[0, 1, 2],
[3, 4, 5]])

1. 匯入需要的模組和包：

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

2. 讀取鳶尾花的資料集：

iris = datasets.load_iris()
x = iris.data     # 資料集對應的特徵矩陣
y = iris.target   # 結果集對應的特徵向量

檢視一下資料集大小：

3. 資料集的劃分

（1）對索引進行亂序處理：

shuffle_index = np.random.permutation(len(x))

（2）指定測試資料集的比例，計算出測試資料集和訓練資料集對應的索引：

# 測試資料集的比例
test_radio = 0.2
# 測試資料集的大小
test_size = int(len(x) * test_radio)
# 測試資料集的索引
test_index = shuffle_index[:test_size]
# 訓練資料集的索引
train_index = shuffle_index[test_size:]

（3）得到的訓練資料集和測試資料集：

# 訓練資料集
x_train = x[train_index]
y_train = y[train_index]
# 測試資料集
x_test = x[test_index]
y_test = y[test_index]

可以檢視其大小：

機器學習演算法 KNN

參考我感覺knn是最簡單的乙個機器學習演算法，實現很簡單，效果也一般般就是算個euclideam distance,還有一些可以優化的地方，比如可以加權，第二篇文章提到了 python 如下 coding utf 8 created on aug 30,2013 author blacklaw ...

機器學習 KNN 演算法

一主要目的在樣本空間中，找到與待估計的樣本最臨近的k個鄰居，用這幾個鄰居的類別來估計待測樣本的類別二適用性樣本容量比較大的類域的自動分類，而樣本容量較小的類域則容易誤分。尤其適用於樣本分類邊界不規則的情況三不足 1 當樣本不平衡時，比如乙個類的樣本容量很大，其他類的樣本容量很小，輸入乙...

機器學習 KNN演算法

一近鄰算法 knn 原理工作原理是存在一個樣本資料集合也稱作訓練樣本集並且樣本集中每個數據都存在標籤即我們知道樣本集中每一數據與所屬分類的對應關係輸人沒有標籤的新 ...

機器學習 KNN演算法（ 資料集的拆分）

機器學習 演算法 KNN

機器學習 KNN 演算法

機器學習 KNN演算法

相關推薦

機器學習 KNN演算法（資料集的拆分）

機器學習演算法 KNN