作業要求:
1、create a classification dataset (n samples ≥ 1000, n features ≥ 10)
2、split the dataset using 10-fold cross validation
3、train the algorithms:
gaussiannb
svc (possible c values [1e-02, 1e-01, 1e00, 1e01, 1e02], rbf kernel)
randomforestclassifier (possible n estimators values [10, 100, 1000])
4、evaluate the cross-validated performance: accuracy, f1-score and auc roc
5、write a short report summarizing the methodology and the results
用sklearn建立乙個分類問題的資料集,然後用三種不同的機器學習方法對資料集進行學習,並對三種方法的準確度、f1分數、受試者工作特徵進行計算。
**如下:
from sklearn import datasets
from sklearn import cross_validation
from sklearn.*****_bayes import gaussiannb
from sklearn.svm import svc
from sklearn.ensemble import randomforestclassifier
from sklearn import metrics
dataset = datasets.make_classification(n_samples = 1000, n_features = 10, n_informative = 2, n_redundant = 2, n_repeated = 0, n_classes = 2)
kf = cross_validation.kfold(len(dataset[0]), n_folds = 10, shuffle = true)
for train_index, test_index in kf:
x_train, y_train = dataset[0][train_index], dataset[1][train_index]
x_test, y_test = dataset[0][test_index], dataset[1][test_index]
clf = gaussiannb()
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
print("gaussian nb")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc))
clf = svc(c = 1e-01, kernel = 'rbf', gamma = 0.1)
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
print("\nsvc")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc))
clf = randomforestclassifier(n_estimators = 6)
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
print("\nrandom forest")
acc = metrics.accuracy_score(y_test, pred)
print('acc: '+str(acc))
f1 = metrics.f1_score(y_test, pred)
print('f1: '+str(f1))
auc = metrics.roc_auc_score(y_test, pred)
print('auc: '+str(auc))
最終輸出如下:
gaussian nb
acc: 0.89
f1: 0.9059829059829059
auc: 0.8881769326167839
svcacc: 0.92
f1: 0.9333333333333333
auc: 0.9136006614303431
random forest
acc: 0.93
f1: 0.9391304347826087
auc: 0.9332368747416288
高階程式設計技術 scipy課後習題
步驟如下 1 m int input please input m n int input please input n if m n m,n n,m 因為題目沒有要求m和n的具體值,只要求m的值大於n的值,所以通過使用者互動來確定m和n的值。2 a np.matrix np.random.rand...
高階程式設計技術作業 5
題目描述 使用乙個for迴圈列印數字1 20 包含 展示 for number in range 1,21 print number input null output 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 題目描述 通過給函式rang...
高階程式設計技術作業 7
題目描述 使用乙個字典來儲存一些人喜歡的數字。請想5個人的名字,並將這些名字用作字典中 的鍵 想出每個人喜歡的乙個數字,並將這些數字作為值儲存在字典中。列印每個人的名字和喜歡 的數字。展示 dic for name,number in dic.items print name str number ...