網格搜尋方法的邏輯非常簡單,屬於暴力窮盡搜尋型別,預先定義好不 同的超引數值,然後讓計算機針對每種組合分別評估模型的效能,從而獲得 最佳組合引數值。
from sklearn.model_selection import validation_curve
import pandas as pd
from sklearn.preprocessing import labelencoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import standardscaler
from sklearn.linear_model import logisticregression
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import gridsearchcv
from sklearn.svm import
svcdf = pd.
read_csv
('***\\wdbc.data'
, header=none)
print
(df.
head()
)x= df.loc[:,
2:].values
y = df.loc[:,
1].values
le =
labelencoder()
y = le.
fit_transform
(y)x_train, x_test, y_train, y_test =
train_test_split(x
, y,
test_size=
0.20
, stratify=y,
random_state=1)
print
(len
(x_train)
)pipe_svc =
make_pipeline
(standardscaler()
,svc
(random_state=1)
) # 支援向量機(svm)
param_range =
[0.0001
,0.001
,0.01
,0.1
,1.0
,10.0
,100.0
,1000.0
]param_grid =[,
]gs =
gridsearchcv
(estimator=pipe_svc,
param_grid=param_grid,
scoring=
'accuracy'
, cv=10,
n_jobs=-1
)gs = gs.
fit(x_train, y_train)
print
(gs.best_score_)
print
(gs.best_params_)
clf = gs.best_estimator_
clf.
fit(x_train, y_train)
print
('test accuracy: %.3f'
% clf.
score
(x_test, y_test)
)
執行結果:
0 1 2 3 4 … 27 28 29 30 31
0 842302 m 17.99 10.38 122.80 … 0.6656 0.7119 0.2654 0.4601 0.11890
1 842517 m 20.57 17.77 132.90 … 0.1866 0.2416 0.1860 0.2750 0.08902
2 84300903 m 19.69 21.25 130.00 … 0.4245 0.4504 0.2430 0.3613 0.08758
3 84348301 m 11.42 20.38 77.58 … 0.8663 0.6869 0.2575 0.6638 0.17300
4 84358402 m 20.29 14.34 135.10 … 0.2050 0.4000 0.1625 0.2364 0.07678
[5 rows x 32 columns]
4550.9846859903381642
test accuracy: 0.974
Python實現資料預處理 離散值處理
1.pandas進行特徵離散處理 標籤處理通常會把字元型的標籤轉換成數值型的 特徵處理 對於特徵來說,一般可以做乙個對映的字典 還可以轉換成編碼 還原資料初始狀態 2.使用sklearn進行離散值處理的方式如下 標籤編碼 labelencoder 資料還原回去可以用inverse transform...
Python資料預處理
1.匯入資料檔案 excel,csv,資料庫檔案等 df read table file,names 列名1,列名2,sep encoding file是檔案路徑,names預設為檔案的第一行為列名,sep為分隔符,預設為空,表示預設匯入為一列 encoding設定檔案編碼,匯入中文時,需設定utf...
python資料預處理
scikit learn 提供的binarizer能夠將資料二元化 from sklearn.preprocessing import binarizer x 1,2,3,4,5 5,4,3,2,1 3,3,3,3,3 1,1,1,1,1 print before transform x binar...