使用sklearn訓練xgboost模型

xgboost是提公升樹方法的一種，演算法由gbdt改進而來，在計算時也採用平行計算，速度更快。sklearn中提供分類和回歸的xgboost模型，本文對二分類問題採用xgboost進行訓練。

import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('data.csv')
#label
label = df.ix[:,[0]]
#特徵features = df.ix[:,[1,2,3,4,5]]
#分訓練集測試集
x_train, x_test, y_train, y_test = train_test_split(features, label, test_size=0.2, random_state=3)

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import accuracy_score
from xgboost import xgbclassifier
from xgboost import plot_importance
from sklearn import metrics
model = xgbclassifier(learning_rate=0.01,
n_estimators=10,           # 樹的個數-10棵樹建立xgboost
max_depth=4,               # 樹的深度
min_child_weight = 1,      # 葉子節點最小權重
gamma=0.,                  # 懲罰項中葉子結點個數前的引數
subsample=1,               # 所有樣本建立決策樹
colsample_btree=1,         # 所有特徵建立決策樹
scale_pos_weight=1,        # 解決樣本個數不平衡的問題
random_state=27,           # 隨機數
slient = 0
)model.fit(x_train,
y_train)

#**
y_test, y_pred = y_test, model.predict(x_test)
print("accuracy : %.4g" % metrics.accuracy_score(y_test, y_pred)) 
y_train_proba = model.predict_proba(x_train)[:,1]
print("auc score (train): %f" % metrics.roc_auc_score(y_train, y_train_proba))
y_proba = model.predict_proba(x_test)[:,1]
print("auc score (test): %f" % metrics.roc_auc_score(y_test, y_proba))

訓練模型儲存和載入（sklearn）

很多模型訓練完成之後，可以進行儲存，下次使用時直接呼叫即可，不需要再次訓練資料。接下來我將介紹sklearn中模型的儲存和載入。from sklean.externals import joblib 儲存訓練模型 joblib.dump lr,tmp test.pkl 匯入模型資料 lr2 jobl...

使用sklearn進行增量學習

sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...

使用sklearn進行增量學習

sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...

使用sklearn訓練xgboost模型

訓練模型儲存和載入（sklearn）

使用sklearn進行增量學習

使用sklearn進行增量學習

相關推薦