1。變數重要得分
或者xgboost本來就有內建函式
進行排序啦,更友好
2.變數選擇
selectfrommodel
比如這樣(記得要transform之後再傳給select
# select features using threshold
selection = selectfrommodel(model, threshold=thresh, prefit=true)
select_x_train = selection.transform(x_train)
# train model
selection_model = xgbclassifier()
selection_model.fit(select_x_train, y_train)
# eval model
select_x_test = selection.transform(x_test)
y_pred = selection_model.predict(select_x_test)
完整**
# use feature importance for feature selection
from numpy import loadtxt
from numpy import sort
from xgboost import xgbclassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import selectfrommodel
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into x and y
x = dataset[:,0:8]
y = dataset[:,8]
# split data into train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=7)
# fit model on all training data
model = xgbclassifier()
model.fit(x_train, y_train)
# make predictions for test data and evaluate
y_pred = model.predict(x_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("accuracy: %.2f%%" % (accuracy * 100.0))
# fit model using each importance as a threshold
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
# select features using threshold
selection = selectfrommodel(model, threshold=thresh, prefit=true)
select_x_train = selection.transform(x_train)
# train model
selection_model = xgbclassifier()
selection_model.fit(select_x_train, y_train)
# eval model
select_x_test = selection.transform(x_test)
y_pred = selection_model.predict(select_x_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("thresh=%.3f, n=%d, accuracy: %.2f%%" % (thresh, select_x_train.shape[1], accuracy*100.0))
就是我們需要設定乙個閾值,到底需不需要選進去,這個例子是先排序,挨個選閾值,看哪個時候最好,輸出如下:
accuracy: 77.95%
thresh=0.071, n=8, accuracy: 77.95%
thresh=0.073, n=7, accuracy: 76.38%
thresh=0.084, n=6, accuracy: 77.56%
thresh=0.090, n=5, accuracy: 76.38%
thresh=0.128, n=4, accuracy: 76.38%
thresh=0.160, n=3, accuracy: 74.80%
thresh=0.186, n=2, accuracy: 71.65%
thresh=0.208, n=1, accuracy: 63.78%
SQL中變數賦初始值的重要性
首先準備一些測試資料,create table tynametable idint,typename nvarchar 10 insert into tynametable values 1,射手 insert into tynametable values 10,法師 insert into ty...
論MongoDB索引選擇的重要性
線上某業務,頻繁出現iops 使用率100 的 每秒4000iops 現象,每次持續接近1個小時,從慢請求的日誌發現是乙個 getmore 請求耗時1個小時,導致iops高 深入調查之後,最終發現竟是乙個索引選擇的問題。2017 11 01t15 04 17.498 0800 i command c...
回顧和總結的重要性
一段時間的緊張開發結束了,作為乙個技術開發者,不知道大家是不是和我一樣,在每次開發新專案的時候都會用到一些新的技術,新的知識點,遇到一些技術難點,一些很奇怪的bug。或許你在當時解決了,但是幾個月之後你只記得你用過某個技術或者遇到錯某個錯誤,但是已經想不起當時是怎麼解決的了。因為我們每天都要接觸很多...