xgboost是提公升樹方法的一種,演算法由gbdt改進而來,在計算時也採用平行計算,速度更快。sklearn中提供分類和回歸的xgboost模型,本文對二分類問題採用xgboost進行訓練。
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('data.csv')
#label
label = df.ix[:,[0]]
#特徵features = df.ix[:,[1,2,3,4,5]]
#分訓練集測試集
x_train, x_test, y_train, y_test = train_test_split(features, label, test_size=0.2, random_state=3)
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import accuracy_score
from xgboost import xgbclassifier
from xgboost import plot_importance
from sklearn import metrics
model = xgbclassifier(learning_rate=0.01,
n_estimators=10, # 樹的個數-10棵樹建立xgboost
max_depth=4, # 樹的深度
min_child_weight = 1, # 葉子節點最小權重
gamma=0., # 懲罰項中葉子結點個數前的引數
subsample=1, # 所有樣本建立決策樹
colsample_btree=1, # 所有特徵建立決策樹
scale_pos_weight=1, # 解決樣本個數不平衡的問題
random_state=27, # 隨機數
slient = 0
)model.fit(x_train,
y_train)
#**
y_test, y_pred = y_test, model.predict(x_test)
print("accuracy : %.4g" % metrics.accuracy_score(y_test, y_pred))
y_train_proba = model.predict_proba(x_train)[:,1]
print("auc score (train): %f" % metrics.roc_auc_score(y_train, y_train_proba))
y_proba = model.predict_proba(x_test)[:,1]
print("auc score (test): %f" % metrics.roc_auc_score(y_test, y_proba))
訓練模型儲存和載入(sklearn)
很多模型訓練完成之後,可以進行儲存,下次使用時直接呼叫即可,不需要再次訓練資料。接下來我將介紹sklearn中模型的儲存和載入。from sklean.externals import joblib 儲存訓練模型 joblib.dump lr,tmp test.pkl 匯入模型資料 lr2 jobl...
使用sklearn進行增量學習
sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...
使用sklearn進行增量學習
sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...