演算法實踐 模型融合

2021-09-08 08:39:18 字數 4354 閱讀 3517

表現最好的模型是gbdt和xgboost。(以準確率和auc值為判別標準)

模型評估**如下:

#gbdt

#訓練集**標籤和概率輸出

train_gbdt_predict = clf_gbdt.predict(x_train)

train_gbdt_predict_pro = clf_gbdt.predict_proba(x_train)[:,1]

#測試集**標籤和概率輸出

test_gbdt_predict = clf_gbdt.predict(x_test)

test_gbdt_predict_pro = clf_gbdt.predict_proba(x_test)[:,1]

#訓練集評分

model_evaluation(y_train,train_gbdt_predict,train_gbdt_predict_pro)

#測試集評分

model_evaluation(y_test,test_gbdt_predict,test_gbdt_predict_pro)

結果:

********************訓練集

accuracy: 0.856026450255

precision: 0.865979381443

recall: 0.503597122302

f1_score: 0.636846095527

roc_auc_score: 0.909330778458

********************測試集

accuracy: 0.771548703574

precision: 0.577464788732

recall: 0.342618384401

f1_score: 0.43006993007

roc_auc_score: 0.762693916727

#xgboost

#訓練集**標籤和概率輸出

train_xgb_predict = clf_xgb.predict(x_train)

train_xgb_predict_pro = clf_xgb.predict_proba(x_train)[:,1]

#測試集**標籤和概率輸出

test_xgb_predict = clf_xgb.predict(x_test)

test_xgb_predict_pro = clf_xgb.predict_proba(x_test)[:,1]

#訓練集評分

model_evaluation(y_train,train_xgb_predict,train_xgb_predict_pro)

#測試集評分

model_evaluation(y_test,test_xgb_predict,test_xgb_predict_pro)

結果:

********************訓練集

accuracy: 0.848512173129

precision: 0.846638655462

recall: 0.483213429257

f1_score: 0.615267175573

roc_auc_score: 0.905284436711

********************測試集

accuracy: 0.784162578837

precision: 0.624390243902

recall: 0.356545961003

f1_score: 0.45390070922

roc_auc_score: 0.769253440164

#lightgbm

#訓練集**標籤和概率輸出

train_lgb_predict = clf_lgb.predict(x_train)

train_lgb_predict_pro = clf_lgb.predict_proba(x_train)[:,1]

#測試集**標籤和概率輸出

test_lgb_predict = clf_lgb.predict(x_test)

test_lgb_predict_pro = clf_lgb.predict_proba(x_test)[:,1]

#訓練集評分

model_evaluation(y_train,train_lgb_predict,train_lgb_predict_pro)

#測試集評分

model_evaluation(y_test,test_lgb_predict,test_lgb_predict_pro)

結果:

********************訓練集

accuracy: 0.994289149384

precision: 1.0

recall: 0.97721822542

f1_score: 0.988477865373

roc_auc_score: 0.999994709407

********************測試集

accuracy: 0.768745620182

precision: 0.564444444444

recall: 0.353760445682

f1_score: 0.434931506849

roc_auc_score: 0.749950444952

# 將xgb作為初始模型進行stacking

folds_stack = repeatedkfold(n_splits=5, n_repeats=2, random_state=4590)

oof_stack = np.zeros(x_train.shape[0])

predictions = np.zeros(x_test.shape[0])

#xgboost

clf_xgb = xgb.xgbclassifier()

for fold_, (trn_idx, val_idx) in enumerate(folds_stack.split(x_train,y_train)):

print("fold {}".format(fold_))

trn_data, trn_y = x_train.iloc[trn_idx], y_train.iloc[trn_idx].values

val_data, val_y = x_train.iloc[val_idx], y_train.iloc[val_idx].values

clf_xgb.fit(trn_data,trn_y)

meta_train_x = clf_xgb.predict(trn_data)#次級模型的訓練、驗證和測試輸入

meta_val_x = clf_xgb.predict(val_data)

meta_test_x = clf_xgb.predict(x_test)

clf_3 = bayesianridge()

clf_3.fit(meta_train_x, trn_y)

oof_stack[val_idx] = clf_3.predict(meta_val_x)

predictions += clf_3.predict(meta_test_x) / 10

#模型融合評估

model_evaluation(y_train,np.int64(oof_stack>0.5),oof_stack)

model_evaluation(y_test,np.int64(predictions>0.5),predictions)

結果:

********************訓練集 交叉驗證

accuracy: 0.793207093478

precision: 0.661504424779

recall: 0.358513189448

f1_score: 0.46500777605

roc_auc_score: 0.64548842274

********************測試集

accuracy: 0.781359495445

precision: 0.614634146341

recall: 0.350974930362

f1_score: 0.446808510638

roc_auc_score: 0.682931154998

問題:

stacking沒有提高。。

多模型融合推薦演算法

常見的多模型融合演算法 多模型融合演算法可以比單一模型演算法有極為明顯的效果提公升。但是怎樣進行有效的融合,充分發揮各個演算法的長處?這裡總結一些常見的融合方法 1.線性加權融合法 是給使用者 user 推薦商品 item 的得分,是演算法k的權重,是演算法k得到的使用者 user 對商品item的...

模型融合 Stacking Blending

模型融合是指通過分層對模型進行整合,比如以兩層為例,第一層為基學習器,使用原始訓練集訓練基學習器,每個基學習器的輸出生成新的特徵,作為第二層模型的輸入,這樣就生成了新的訓練集 第二層模型在新的訓練集上再進行訓練,從而得到融合的模型。stacking stacking是模型融合的常用方法,重點是在第一...

keras模型融合

1.構建模型 模型1 33次風運動 defmodel1 ipt layer1 dense units 512,input dim input node,kernel initializer normal activation sigmoid name layer1 ipt layer2 dense ...