表現最好的模型是gbdt和xgboost。(以準確率和auc值為判別標準)
模型評估**如下:
#gbdt
#訓練集**標籤和概率輸出
train_gbdt_predict = clf_gbdt.predict(x_train)
train_gbdt_predict_pro = clf_gbdt.predict_proba(x_train)[:,1]
#測試集**標籤和概率輸出
test_gbdt_predict = clf_gbdt.predict(x_test)
test_gbdt_predict_pro = clf_gbdt.predict_proba(x_test)[:,1]
#訓練集評分
model_evaluation(y_train,train_gbdt_predict,train_gbdt_predict_pro)
#測試集評分
model_evaluation(y_test,test_gbdt_predict,test_gbdt_predict_pro)
結果:
********************訓練集
accuracy: 0.856026450255
precision: 0.865979381443
recall: 0.503597122302
f1_score: 0.636846095527
roc_auc_score: 0.909330778458
********************測試集
accuracy: 0.771548703574
precision: 0.577464788732
recall: 0.342618384401
f1_score: 0.43006993007
roc_auc_score: 0.762693916727
#xgboost
#訓練集**標籤和概率輸出
train_xgb_predict = clf_xgb.predict(x_train)
train_xgb_predict_pro = clf_xgb.predict_proba(x_train)[:,1]
#測試集**標籤和概率輸出
test_xgb_predict = clf_xgb.predict(x_test)
test_xgb_predict_pro = clf_xgb.predict_proba(x_test)[:,1]
#訓練集評分
model_evaluation(y_train,train_xgb_predict,train_xgb_predict_pro)
#測試集評分
model_evaluation(y_test,test_xgb_predict,test_xgb_predict_pro)
結果:
********************訓練集
accuracy: 0.848512173129
precision: 0.846638655462
recall: 0.483213429257
f1_score: 0.615267175573
roc_auc_score: 0.905284436711
********************測試集
accuracy: 0.784162578837
precision: 0.624390243902
recall: 0.356545961003
f1_score: 0.45390070922
roc_auc_score: 0.769253440164
#lightgbm
#訓練集**標籤和概率輸出
train_lgb_predict = clf_lgb.predict(x_train)
train_lgb_predict_pro = clf_lgb.predict_proba(x_train)[:,1]
#測試集**標籤和概率輸出
test_lgb_predict = clf_lgb.predict(x_test)
test_lgb_predict_pro = clf_lgb.predict_proba(x_test)[:,1]
#訓練集評分
model_evaluation(y_train,train_lgb_predict,train_lgb_predict_pro)
#測試集評分
model_evaluation(y_test,test_lgb_predict,test_lgb_predict_pro)
結果:
********************訓練集
accuracy: 0.994289149384
precision: 1.0
recall: 0.97721822542
f1_score: 0.988477865373
roc_auc_score: 0.999994709407
********************測試集
accuracy: 0.768745620182
precision: 0.564444444444
recall: 0.353760445682
f1_score: 0.434931506849
roc_auc_score: 0.749950444952
# 將xgb作為初始模型進行stacking
folds_stack = repeatedkfold(n_splits=5, n_repeats=2, random_state=4590)
oof_stack = np.zeros(x_train.shape[0])
predictions = np.zeros(x_test.shape[0])
#xgboost
clf_xgb = xgb.xgbclassifier()
for fold_, (trn_idx, val_idx) in enumerate(folds_stack.split(x_train,y_train)):
print("fold {}".format(fold_))
trn_data, trn_y = x_train.iloc[trn_idx], y_train.iloc[trn_idx].values
val_data, val_y = x_train.iloc[val_idx], y_train.iloc[val_idx].values
clf_xgb.fit(trn_data,trn_y)
meta_train_x = clf_xgb.predict(trn_data)#次級模型的訓練、驗證和測試輸入
meta_val_x = clf_xgb.predict(val_data)
meta_test_x = clf_xgb.predict(x_test)
clf_3 = bayesianridge()
clf_3.fit(meta_train_x, trn_y)
oof_stack[val_idx] = clf_3.predict(meta_val_x)
predictions += clf_3.predict(meta_test_x) / 10
#模型融合評估
model_evaluation(y_train,np.int64(oof_stack>0.5),oof_stack)
model_evaluation(y_test,np.int64(predictions>0.5),predictions)
結果:問題:********************訓練集 交叉驗證accuracy: 0.793207093478
precision: 0.661504424779
recall: 0.358513189448
f1_score: 0.46500777605
roc_auc_score: 0.64548842274
********************測試集
accuracy: 0.781359495445
precision: 0.614634146341
recall: 0.350974930362
f1_score: 0.446808510638
roc_auc_score: 0.682931154998
stacking沒有提高。。
多模型融合推薦演算法
常見的多模型融合演算法 多模型融合演算法可以比單一模型演算法有極為明顯的效果提公升。但是怎樣進行有效的融合,充分發揮各個演算法的長處?這裡總結一些常見的融合方法 1.線性加權融合法 是給使用者 user 推薦商品 item 的得分,是演算法k的權重,是演算法k得到的使用者 user 對商品item的...
模型融合 Stacking Blending
模型融合是指通過分層對模型進行整合,比如以兩層為例,第一層為基學習器,使用原始訓練集訓練基學習器,每個基學習器的輸出生成新的特徵,作為第二層模型的輸入,這樣就生成了新的訓練集 第二層模型在新的訓練集上再進行訓練,從而得到融合的模型。stacking stacking是模型融合的常用方法,重點是在第一...
keras模型融合
1.構建模型 模型1 33次風運動 defmodel1 ipt layer1 dense units 512,input dim input node,kernel initializer normal activation sigmoid name layer1 ipt layer2 dense ...