機器學習實戰系列讀書筆記之KNN演算法（一）

首先介紹knn演算法用到的numpy的幾個知識點：

1.shape函式是numpy.core.fromnumeric中的函式，它的功能是檢視矩陣或者陣列的維數。

舉例說明：

建立乙個4×2的矩陣c, c.shape[0] 為第一維的長度，c.shape[1] 為第二維的長度。

>>> c = array([[1,1],[1,2],[1,3],[1,4]])

>>> c.shape

(4, 2)

>>> c.shape[0]

4 >>> c.shape[1]

2 2.python中numpy模組tile方法說明

函式形式： tile(a，rep)

功能：重複a的各個維度

引數型別：

- a: array類的都可以

- rep：a沿著各個維度重複的次數

舉例：

tile
([1,2],2)

輸出[1,2,1,2]

tile
([1,2],(2,2))

重複順序為： [1,2] => [[1,2] , [1,2]] => [[1,2,1,2] , [1,2,1,2]]

tile
([1,2],(2,2,3))

重複順序為： [1,2] => [[1,2] , [1,2]] => [[[1,2],[1,2]] , [[1,2],[1,2]]] => [[[1,2,1,2,1,2],[1,2,1,2,1,2]] , [[1,2,1,2,1,2],[1,2,1,2,1,2]]]

3.sum(axis = 1)

現在對於資料的處理更多的還是numpy。沒有axis引數表示全部相加，axis＝0表示按列相加，axis＝1表示按照行的方向相加

[python]view plain

copy

>>>

import

numpy as np

>>> a=np.sum([[0,1

,2],[2,1

,3]])

>>> a

9>>> a.shape

()

>>> a=np.sum([[0,1

,2],[2,1

,3]],axis=

>>> a

array([2, 2

, 5])

>>> a.shape

(3,)

>>> a=np.sum([[0,1

,2],[2,1

,3]],axis=

>>> a

array([3, 6

])

>>> a.shape

(2,)

4.argsort()

argsort() 函式將陣列的值從小到大排序後，並按照其相對應的索引值輸出

舉例說明：

一維陣列

[plain]view plain

copy

>>> a = array([3,1,2])

>>> argsort(a)

array([1, 2, 0])

二維陣列

[plain]view plain

copy

>>> b = array([[1,2],[2,3]])

>>> argsort(b,axis=1) #按行排序

array([[0, 1],

[0, 1]])

>>> argsort(b,axis=0) #按列排序

array([[0, 0],

[1, 1]])

>>>

5.get方法

python 字典(dictionary) get() 函式返回指定鍵的值，如果值不在字典中返回預設值。

get()方法語法：

dict
.get
(key
,default
=none
)

iteritems()

python字典中還存在items()方法。兩者有些許區別。

items方法是可以將字典中的所有項，以列表方式返回。

iteritems方法與items方法相比作用大致相同，只是它的返回值不是列表，而是乙個迭代器。

[python]view plain

copy

>>> d =

>>> x = d.items()

>>> x

[('1'

, 'one'

), (

'3',

'three'

), (

'2',

'two'

)]

>>> type(x)

'list'

>>> y = d.iteritems()

>>> y

0x025008a0

>>> type(y)

'dictionary-itemiterator'

7.實施knn演算法

from numpy import *
import operator
def createdataset():
group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])
lables = ['a', 'a', 'b', 'b']
return group, lables
def classify0(inx, dataset, lables, k):
datasetsize = dataset.shape[0]
diffmat = tile(inx, (datasetsize,1)) - dataset
sqdiffmat = diffmat**2
sqdistance = sqdiffmat.sum(axis=1)
distances = sqdistance**0.5
sorteddistance = distances.argsort()
# 對距離進行排序，argsort()函式預設按公升序排列，但只返回下標，不對原陣列排序
classcount = {}
for i in range(k):
# 統計最近的 k 個點的類別出現的次數
votelable = lables[sorteddistance[i]]
classcount[votelable] = classcount.get(votelable, 0) + 1
sortedclasscount = sorted(classcount.items(), key=operator.itemgetter(1), reverse = true)
# 對類別出現的次數進行排序，sorted()函式預設公升序
return sortedclasscount[0][0]
# 返回類別出現次數最多的分類名稱
if __name__ == "__main__":
group, lables = createdataset()
print('group is:')
print(group)
print('labels is:')
print(lables)
t = classify0([0, 0], group, lables, 3)
print('t is:')
print(t)

輸出結果為：

group is:

[[ 1. 1.1]

[ 1. 1. ]

[ 0. 0. ]

[ 0. 0.1]]

labels is:

['a', 'a', 'b', 'b']

t is:

《機器學習實戰》讀書筆記

監督學習使用兩種型別的目標變數之所以稱監督學習,是因為這類演算法必須知道什麼,即目標變數的分類資訊在無監督學習中,將資料集合分成由類似的物件組成的多個類的過程被稱為聚類將尋找描述資料統計值的過程稱之為密度估計是否要預測目標變數的值是監督學習目標變數型別 begin離散型分類演...

機器學習實戰讀書筆記（三）

從本節開始，將介紹無監督學習。今天整理了無監督學習中的k均值聚類演算法和mapreduce部分的內容。本部分是8月24號的內容。9.k均值聚類演算法 1 k均值聚類演算法優點易於實現。缺點可能收斂到區域性最小值，在大規模的資料集上的收斂速度慢。適用資料型別數值型。可以用的誤差指標如誤差的平方...

《機器學習實戰》讀書筆記 1

本人開發工具為pycharm，python版本是3.5 第二章 knn 2.1 概述 k 緊鄰演算法的一般流程收集資料可以使用任何方法準備資料距離計算所需要的數值，最好是結構化的資料格式分子資料可以使用任何方法訓練資料此步驟不適應於k 緊鄰演算法測試資料計算錯誤率使用演算法首...

機器學習實戰系列 讀書筆記之KNN演算法（一）

《機器學習實戰》讀書筆記

機器學習實戰讀書筆記（三）

《機器學習實戰》讀書筆記 1

相關推薦

機器學習實戰系列讀書筆記之KNN演算法（一）