使用之前的資料data_all.csv
在程式編寫完之後進行執行會出現多個警告:deprecationwarning: the truth value of an empty array is ambiguous. returning false, but in future this will result in an error. usearray.size > 0
to check that an array is not empty. if diff:
警告的意思是:空陣列的真值是不明確的。返回false,但會導致錯誤。使用array.size> 0
來檢查陣列是否為空。
解決:忽略警告:由於numpy在空陣列上棄用了真值檢查而出現的警告,可以直接忽略這個問題,新增如下**:
import warnings
warnings.filterwarnings(
"ignore"
)
以上警告則會消失。
#匯入包
from sklearn.ensemble import randomforestclassifier
from sklearn.ensemble import gradientboostingclassifier
from sklearn.model_selection import train_test_split
from xgboost import xgbclassifier
from lightgbm import lgbmclassifier
import pandas as pd
import warnings
#忽略警告:由於numpy在空陣列上棄用了真值檢查而出現的警告,可以直接忽略這個問題
#警告詳情:deprecationwarning: the truth value of an empty array is ambiguous. returning false, but in future this will result in an error. use `array.size > 0` to check that an array is not empty. if diff:
#翻譯:空陣列的真值是不明確的。返回false,但會導致錯誤。使用`array.size> 0`來檢查陣列是否為空。
warnings.filterwarnings(
"ignore"
)#讀取資料
data_all = pd.read_csv(
'data_all.csv'
)print
("資料行列數"
,data_all.shape)
#劃分資料集
x = data_all.drop(
['status'
],axis=1)
#'status'列是標籤
y = data_all[
'status'
]x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=
0.3,random_state=
2018
)#構建模型
#1.隨機森林
rfc = randomforestclassifier(
)rfc.fit(x_train,y_train)
rfc_score = rfc.score(x_test,y_test)
#2.gbdt
gbc = gradientboostingclassifier(
)gbc.fit(x_train,y_train)
gbc_score = gbc.score(x_test,y_test)
#3.xgboost
xgbc = xgbclassifier(
)xgbc.fit(x_train,y_train)
xgbc_score = xgbc.score(x_test,y_test)
#4.ligthgbm
lgbc = lgbmclassifier(
)lgbc.fit(x_train,y_train)
lgbc_score = lgbc.score(x_test,y_test)
print
("randomforestclassifier acc: %f, gradientboostingclassifier acc: %f"
%(rfc_score, gbc_score)
)print
("xgbclassifier acc: %f, lgbmclassifier acc: %f"
%(xgbc_score, lgbc_score)
)
randomforestclassifier acc:
0.763139
, gradientboostingclassifier acc:
0.779958
xgbclassifier acc:
0.785564
, lgbmclassifier acc:
0.770147
一周演算法實踐day1 模型構建
這份資料集是金融資料 非原始資料,已經處理過了 我們要做的是 貸款使用者是否會逾期。中 status 是結果標籤 0表示未逾期,1表示逾期。data all pd.read csv data all.csv x train,x test,y train,y test train test split...
《演算法筆記》Day 2
全排列問題 include const int maxn 11 int n,p maxn hashtable maxn void generatep int index printf n return for int x 1 x n x int main void 推演 hashtable fals...
一周演算法專案實踐(四)
使用網格搜尋法對7個模型進行調優 調參時採用五折交叉驗證的方式 並進行模型評估 import pandas as pd import numpy as np from sklearn.model selection import train test split from sklearn.prepr...