高階程式設計技術第十五周作業

本週簡單介紹了以下sklearn這個庫，簡單來說sklearn是基於numpy、scipy等基礎數學庫的乙個機器學習庫，提供了幾種機器學習演算法。

create a classification dataset (n samples >= 1000, n features >= 10)

split the dataset using 10-fold cross validation

train the algorithms

evaluate the cross-validated performance

write a short report summarizing the methodology and the results

簡要來說就是應用sklearn中的三種模型到乙個分類模型資料集中，並進行資料**。

使用sklearn的方式如同上述步驟一樣。

1. 建立資料集

2. 分割資料集以進行交叉驗證

3. 訓練模型

4. 應用模型

5. 對模型進行評估

下面是實現**

from sklearn import datasets
from sklearn import cross_validation
from sklearn.*****_bayes import gaussiannb
from sklearn.svm import svc
from sklearn.ensemble import randomforestclassifier
from sklearn import metrics
dataset = datasets.make_classification(
n_samples=2000, n_features=15, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2
)kf = cross_validation.kfold(len(dataset[0]), n_folds=10, shuffle=true)
for train_index, test_index in kf:
x_train, y_train = dataset[0][train_index], dataset[1][train_index]
x_test, y_test = dataset[0][test_index], dataset[1][test_index]
clf = 
pred = 
acc = 
f1 = 
auc = 
algorithm = [
'gaussiannb',
'svc[c=1e-02]',
'svc[c=1e-01]',
'svc[c=1e00]',
'svc[c=1e01]',
'svc[c=1e02]',
'randomforestclassifier[n estimators=10]',
'randomforestclassifier[n estimators=100]',
'randomforestclassifier[n estimators=1000]'
]for i in range(0, len(clf)):
clf[i].fit(x_train, y_train)
print("evaluate of {}:".format(algorithm[i]))
print("accuracy:{}".format(acc[i]))
print("f1-score:{}".format(f1[i]))
print("auc roc:{}".format(auc[i]))

執行結果如下：

evaluate
ofgaussiannb:
accuracy
:0.835
f1-score
:0.8374384236453202
aucroc
:0.8350000000000001
evaluate
ofsvc
[c=1e-02]:
accuracy
:0.825
f1-score
:0.8372093023255814
aucroc
:0.825
evaluate
ofsvc
[c=1e-01]:
accuracy
:0.875
f1-score
:0.8756218905472637
aucroc
:0.875
evaluate
ofsvc
[c=1e00]:
accuracy
:0.895
f1-score
:0.8985507246376813
aucroc
:0.8950000000000001
evaluate
ofsvc
[c=1e01]:
accuracy
:0.875
f1-score
:0.8756218905472637
aucroc
:0.875
evaluate
ofsvc
[c=1e02]:
accuracy
:0.86
f1-score
:0.8599999999999999
aucroc
:0.86
evaluate
ofrandomforestclassifier
[n estimators=10]:
accuracy
:0.925
f1-score
:0.9238578680203046
aucroc
:0.925
evaluate
ofrandomforestclassifier
[n estimators=100]:
accuracy
:0.92
f1-score
:0.9215686274509804
aucroc
:0.92
evaluate
ofrandomforestclassifier
[n estimators=1000]:
accuracy
:0.92
f1-score
:0.9215686274509804
aucroc
:0.92

可以看到樣本數2000，特徵數15的情況下

gaussiannb表現一般

svc當c過小或過大的情況下甚至比gaussiannb精確度還差，在正確選擇c值的時候能提供不錯的精確度

randomforestclassifier有著最高的精確度（以及最長的執行時間），然而n estimators並不會對精確度有太大的影響

高階程式設計技術第十二周作業

本週講了matplotlib，乙個用於繪製影象的庫。要求用numpy和matplotlib來完成三道練習題。在區間 0,2 上繪製函式f x sin 2 x 2 e x 2 要求加上軸標籤和標題。matplotlib的畫圖用法跟matlib基本一樣，因此只需要按照matlib的步驟來繪製影象即可。由...

高階程式設計技術第十三周作業

本週需要學習如何使用scipy。scipy中包含了許多跟numpy一樣的函式，因此使用起來會有許多相似之處。exercise 10.1 least squares 生成乙個m行n列的矩陣，並要求m n。同時生成乙個m維向量。求解x arg minx ax b 2.該題可以使用lstsq來求解，sci...

第十五周作業

要求二 7 1 求最大值及其下標一，實驗 include int main void for i 0 i if max printf d d n max,j return0 二，設計思路 1.根據題意定義變數，分別是整型變數i 下標 n 陣列長度 j,max 最大值 a 10 陣列 2.理解題意，...

高階程式設計技術 第十五周作業

高階程式設計技術 第十二周作業

高階程式設計技術 第十三周作業

第十五周作業

相關推薦

高階程式設計技術第十五周作業

高階程式設計技術第十二周作業

高階程式設計技術第十三周作業