資料探勘專案(五)

2021-09-12 09:26:32 字數 2803 閱讀 8918

目標任務:【模型調優】

使用網格搜尋法對5個模型進行調優(調參時採用五折交叉驗證的方式),並進行模型評估,記得展示**的執行結果。

網格搜尋是一種調參手段;窮舉搜尋:在所有候選的引數選擇中,通過迴圈遍歷,嘗試每一種可能性,表現最好的引數就是最終的結果。其原理就像是在陣列裡找最大值。(為什麼叫網格搜尋?以有兩個引數的模型為例,引數a有3種可能,引數b有4種可能,把所有可能性列出來,可以表示成乙個3*4的**,其中每個cell就是乙個網格,迴圈過程就像是在每個網格裡遍歷、搜尋,所以叫grid search)
from sklearn.model_selection import gridsearchcv

cv = 5

scoring = 'accuracy'

n_jobs = -1

```# 邏輯回歸調參

model = logisticregression()

param_grid =

grid_search = gridsearchcv(model, param_grid=param_grid, cv=cv, scoring=scoring, n_jobs=n_jobs, return_train_score=true)

grid_search.fit(x_selected, y_train)

train_score = grid_search.best_score_

test_score = grid_search.score(x_test_selected, y_test)

best_params = grid_search.best_params_

print(f'train score: , test score: , best params: ')

# svm調參

model = svc()

param_grid =

grid_search = gridsearchcv(model, param_grid=param_grid, cv=cv, scoring=scoring, n_jobs=n_jobs, return_train_score=true)

grid_search.fit(x_selected, y_train)

train_score = grid_search.best_score_

test_score = grid_search.score(x_test_selected, y_test)

best_params = grid_search.best_params_

print(f'train score: , test score: , best params: ')

# 決策樹調參

model = decisiontreeclassifier()

param_grid =

grid_search = gridsearchcv(model, param_grid=param_grid, cv=cv, scoring=scoring, n_jobs=n_jobs, return_train_score=true)

grid_search.fit(x_selected, y_train)

train_score = grid_search.best_score_

test_score = grid_search.score(x_test_selected, y_test)

best_params = grid_search.best_params_

print(f'train score: , test score: , best params: ')

# 隨機森林調參

model = randomforestclassifier()

param_grid =

grid_search = gridsearchcv(model, param_grid=param_grid, cv=cv, scoring=scoring, n_jobs=n_jobs, return_train_score=true)

grid_search.fit(x_selected, y_train)

train_score = grid_search.best_score_

test_score = grid_search.score(x_test_selected, y_test)

best_params = grid_search.best_params_

print(f'train score: , test score: , best params: ')

# xgboost調參

model = xgbclassifier()

param_grid =

grid_search = gridsearchcv(model, param_grid=param_grid, cv=cv, scoring=scoring, n_jobs=n_jobs, return_train_score=true)

grid_search.fit(x_selected, y_train)

train_score = grid_search.best_score_

test_score = grid_search.score(x_test_selected, y_test)

best_params = grid_search.best_params_

print(f'train score: , test score: , best params: ')

參考

資料探勘專案(一)

第一次實踐資料探勘。虛心學習。基於機器學習的資料分析模型的建立,主要分為以下幾步 資料獲取 資料預處理 模型選擇 資料統一化 模型建立 模型結果分析 首先要對資料進行評估,資料的大小來決定使用工具。本資料為金融資料,目的為 貸款使用者是否會逾期。匯入資料 import pandas as pd im...

資料探勘專案(二)

特徵工程 2天 目標 對資料特徵進行衍生和進行特徵挑選。包括但不限於 特徵衍生,特徵挑選。分別用iv值和隨機森林等進行特徵選擇 以及你能想到特徵工程處理。特徵選擇 feature selection 也稱特徵子集選擇 feature subset selection fss 或屬性選擇 attrib...

資料探勘 如何做資料探勘專案

筆者鼓勵致力於從事資料行業的去參加一些人工智慧,機器學習的培訓,然後有人說 其實很多企業不喜歡培訓出來的人,認為培訓不貼近實際,紙上談兵。我倒不這麼看,其實即使在企業內乾資料探勘的人,很多也出不了活,這個不僅僅涉及業務和技術,更是管理上的問題。任正非說,華為最後能留下來的財富只有兩樣 一是管理框架 ...