適用問題:二類分類
實驗資料:由於是二分類器,所以將minst資料集train.csv的label列進行了一些微調,label等於0的繼續等於0,label大於0改為1。這樣就將十分類的資料改為二分類的資料。獲取位址train_binary.csv
實現**:
# encoding=utf-8
import pandas as pd
import random
import time
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
class
perceptron
(object):
def__init__
(self):
self.learning_step = 0.001
# 學習率
self.max_iteration = 5000
# 分類正確上界,當分類正確的次數超過上界時,認為已訓練好,退出訓練
deftrain
(self, features, labels):
# 初始化w,b為0,b在最後一位
self.w = [0.0] * (len(features[0]) + 1)
correct_count = 0
# 分類正確的次數
while correct_count < self.max_iteration:
# 隨機選取資料(xi,yi)
index = random.randint(0, len(labels) - 1)
x = list(features[index])
y = 2 * labels[index] - 1
# label為1轉化為正例項點+1,label為0轉化為負例項點-1
# 計算w*xi+b
wx = sum([self.w[j] * x[j] for j in range(len(self.w))])
# 如果yi(w*xi+b) > 0 則分類正確的次數加1
if wx * y > 0:
correct_count += 1
continue
# 如果yi(w*xi+b) <= 0 則更新w(最後一位實際上b)的值
for i in range(len(self.w)):
self.w[i] += self.learning_step * (y * x[i])
defpredict_
(self, x):
wx = sum([self.w[j] * x[j] for j in range(len(self.w))])
return int(wx > 0) # w*xi+b>0則返回返回1,否則返回0
defpredict
(self, features):
labels =
for feature in features:
x = list(feature)
return labels
if __name__ == '__main__':
print("start read data")
time_1 = time.time()
raw_data = pd.read_csv('../data/train_binary.csv', header=0) # 讀取csv資料,並將第一行視為表頭,返回dataframe型別
data = raw_data.values
features = data[::, 1::]
labels = data[::, 0]
# 避免過擬合,採用交叉驗證,隨機選取33%資料作為測試集,剩餘為訓練集
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=0)
time_2 = time.time()
print('read data cost %f seconds' % (time_2 - time_1))
print('start training')
p = perceptron()
p.train(train_features, train_labels)
time_3 = time.time()
print('training cost %f seconds' % (time_3 - time_2))
print('start predicting')
test_predict = p.predict(test_features)
time_4 = time.time()
print('predicting cost %f seconds' % (time_4 - time_3))
score = accuracy_score(test_labels, test_predict)
print("the accruacy score is %f" % score)
**亦可從我的github上獲取
執行結果:
實現**(用sklearn實現):
# encoding=utf-8
import pandas as pd
import time
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import perceptron
if __name__ == '__main__':
print("start read data...")
time_1 = time.time()
raw_data = pd.read_csv('../data/train_binary.csv', header=0) # 讀取csv資料,並將第一行視為表頭,返回dataframe型別
data = raw_data.values
features = data[::, 1::]
labels = data[::, 0]
# 隨機選取33%資料作為測試集,剩餘為訓練集
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size=0.33, random_state=0)
time_2 = time.time()
print('read data cost %f seconds' % (time_2 - time_1))
print('start training...')
clf = perceptron(alpha=0.0001,max_iter=2000) # 設定步長及最大迭代次數
clf.fit(train_features,train_labels)
time_3 = time.time()
print('training cost %f seconds' % (time_3 - time_2))
print('start predicting...')
test_predict = clf.predict(test_features)
time_4 = time.time()
print('predicting cost %f seconds' % (time_4 - time_3))
score = accuracy_score(test_labels, test_predict)
print("the accruacy score is %f" % score)
**亦可從我的github上獲取
執行結果:
李航《統計學習方法》 感知機
這一章就講了感知機。我覺得是深受工業革命的影響,把一些可以實現功能的基本單元都喜歡叫做什麼機,這裡的感知機,還有後來的以感知機為基礎的支援向量機。直接看定義,看本質,實際上,感知機是一種線性分類模型。下面就以這句話為中心仔細闡述一下。什麼叫線性。線性liner,正如其名,兩個變數的關係的函式是一條直...
統計學習方法 第二章 感知機
感知機是二分類的線性分類模型,輸入是例項的特徵x rn,輸出是例項的類別 感知機對應於輸入空間中將例項劃分為正負兩類的分離超平面,屬於判別模型。感知機學習旨在求出將訓練資料進行線性劃分的分離超平面,為此,匯入基於誤分類的損失函式,利用梯度下降法對損失函式進行極小化,求得感知機模型。假設輸入空間 特徵...
統計學習方法(第二章)感知機
1 什麼是感知機 在 機器學習中,感知機 perceptron 是二分類的線性分類模型,屬於監督學習演算法。輸入為例項的特徵向量,輸出為例項的類別 取 1和 1 感知機對應於輸入空間中將例項劃分為兩類的分離超平面。感知機旨在求出該超平面,為求得超平面匯入了基於誤分類的損失函式,利用梯度下降法 對損失...