機器學習之模型選擇與調優

交叉驗證：將拿到的訓練資料，分為訓練和驗證集。以下圖為例：將資料分成5份，其中乙份作為驗證集。然後經過5次(組)的測試，每次都更換不同的驗證集。即得到5組模型的結果，取平均值作為最終結果。又稱5折交叉驗證。

?：五折交叉驗證，就是分成5份，三份訓練，乙份驗證，乙份測試

我們之前知道資料分為訓練集和測試集，但是**為了讓從訓練得到模型結果更加準確。**做以下處理

? 交叉驗證的目的：為了讓被評估的模型更加準確可信

通常情況下，有很多引數是需要手動指定的（如k-近鄰演算法中的k值），這種叫做超引數。但是手動過程繁雜，所以需要對模型預設幾種超引數組合。每組超引數都採用交叉驗證來進行評估。最後選出最優引數組合建立模型。

# 獲取資料集，載入鳶尾花資料集
from sklearn.datasets import load_iris
# 分割資料集
from sklearn.model_selection import train_test_split
# 特徵工程：標準化
from sklearn.preprocessing import standardscaler
# k-近鄰演算法api
from sklearn.neighbors import kneighborsclassifier
defknn_demo()
:'''knn演算法對鳶尾花資料集分類演示'''
# ? 獲取資料集
iris = load_iris(
)# ? 分割資料集,引數(特徵值，目標值，劃分比例，隨機種子)
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=
0.3, random_state=8)
# ? 特徵工程：標準化
# ✨ 例項化乙個轉換器類
transfer = standardscaler(
)# ? 傳入資料呼叫fit_transform進行轉換
x_train = transfer.fit_transform(x_train)
# ? 因為來自於同乙個資料集，所以用上乙個的標註差就行，因為均值，標準差是一樣的。
x_test = transfer.transform(x_test)
# ? 例項化乙個估計器
estimator = kneighborsclassifier(
)# ? 模型選擇與調優----網格搜尋和交叉驗證
# ? 準備要調的超引數
param_dict =
estimator = gridsearchcv(estimator, param_grid=param_dict, cv=3)
# ? 模型訓練和評估
# ? 傳入訓練資料集，進行機器學習
estimator.fit(x_train, y_train)
# ? 模型評估
# ? 方法1：比較真實值和**值, y_predict**值
y_predict = estimator.predict(x_test)
print
('**值為:\n'
, y_predict)
print
('比較真實值與**值結果為:\n'
, y_predict == y_test)
# ? 方法2：直接計算模型準確率
print
('模型準確率為:\n'
, estimator.score(x_test, y_test)
)return
none
# ✋ 呼叫函式輸出結果。
knn_demo(
)

print
("在交叉驗證中驗證的最好結果：\n"
, estimator.best_score_)
print
("最好的引數模型：\n"
, estimator.best_estimator_)
print
("每次交叉驗證後的準確率結果：\n"
, estimator.cv_results_)

分割資料集

標準化處理

k-近鄰**

? **過程

# 1.? 獲取資料集
facebook = pd.read_csv(
'./fblocation/train.csv'
)

# 2.? 基本的資料處理，拿到特徵值和目標值
# 1）? 縮小資料範圍
facebook = facebook.query(
"x > 1.0 & x <1.25 & y > 2.0 & y < 2.25"
)# 2) ? 選取有用的時間特徵
time_value = pd.to_datetime(facebook[
"time"
], unit=
"s")
time_value = pd.datetimeindex(time_value)
facebook[
"day"
]= time_value.day
facebook[
"hour"
]= time_value.hour
facebook[
"weekday"
]= time_value.weekday
# 3）? 去掉簽到較少的地點
place_count = facebook.groupby(
"place_id"
).count(
)place_count = place_count[place_count[
"row_id"
]>3]
facebook = facebook[facebook[
"place_id"
].isin(place_count.index)
]

# 4）? 拿到特徵值x和目標值y
x = facebook[
["x"
,"y"
,"accuracy"
,"day"
,"hour"
,"weekday"]]
y = facebook[
"place_id"
]

# 5）? 資料集的劃分
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=
6)

# 6）? 特徵工程：標準化
transfre = standardscaler(
)x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

# 7）? knn估計器流程
estimator = kneighborsclassifier(
)# 8) ? 模型評估
# 方法1：比對真實值和**值
y_predict = estimator.predict(x_test)
print
("**結果為:\n"
, y_predict)
print
("比對真實值和**值：\n"
, y_predict == y_test)
# 方法2：直接計算準確率
score = estimator.score(x_test, y_test)
print
("準確率為：\n"
, score)
# 7、交叉驗證和網格搜尋的結果
print
("在交叉驗證中驗證的最好結果:\n"
, estimator.best_score_)
print
("最好的引數模型:\n"
, estimator.best_estimator_)
print
("每次交叉驗證後的準確率結果:\n"
, estimator.cv_results_)

機器學習之模型選擇與調優

機器學習之模型的選擇與調優

Spark機器學習模型選擇與引數調優之交叉驗證

機器學習全套教程（十）模型選擇與調優

機器學習之模型選擇與調優

機器學習之模型的選擇與調優

Spark機器學習 模型選擇與引數調優之交叉驗證

機器學習全套教程（十） 模型選擇與調優

相關推薦

Spark機器學習模型選擇與引數調優之交叉驗證

機器學習全套教程（十）模型選擇與調優