python 隨機森林調參隨機森林調參

前兩天寫了個scikit-learn初步學習，今天沒事又照著寫了個rf模型的，剛開始還不懂這個python列表推導式，想了想還是挺好用的。

然後用了gridsearchcv這個引數優化類，遍歷多種引數組合(也就是暴搜最優引數組合)，通過交叉驗證確定最佳效果引數。

所以優化完可能對訓練資料擬合更差，泛化能力更強？

最後還有把資料劃分為訓練集和測試集。

最後說一下用的資料還是scikit-learn初步學習裡面的但是加了標籤，也就是在資料第一行新增上f1,f2,f3,f4,f5,f6,f7,f8,result。

#coding=utf-8

import pandas as pd

import numpy as np

from sklearn.ensemble import randomforestclassifier

from sklearn.grid_search import gridsearchcv

from sklearn import cross_validation,metrics

import matplotlib.pylab as plt

from sklearn.externals import joblib

from sklearn.cross_validation import train_test_split

train = pd.read_csv("c:\users\administrator\desktop\hh_practice.csv")

# for row in train:

# print row

# print 8888

# print train.head(10)

# print train['result'][:10]

print '類別輸出'

target = "result"

print train['result'].value_counts()

print '樣本的特徵'

print train.columns

# print train[5:6]

#將特徵和型別分開

x_col = [x for x in train.columns if x != 'result']

x = train[x_col]

# print x

y = train['result']

rf_model = randomforestclassifier();

rf_model.fit(x,y)

expected = y

# predicted = rf_model.predict(x)

# #**結果

# print(metrics.classification_report(expected,predicted))

# print(metrics.confusion_matrix(expected,predicted))

y_predprob = rf_model.predict_proba(x)

print y_predprob

#引數調整範圍

param_test1=

gsearch1= gridsearchcv(estimator = randomforestclassifier(min_samples_split=100,

min_samples_leaf=20,max_depth=8,max_features='sqrt' ,random_state=10),

param_grid =param_test1, scoring='roc_auc',cv=5)

gsearch1.fit(x,y)

print '第一次調整引數'

print gsearch1.grid_scores_

print gsearch1.best_params_

print gsearch1.best_score_

param_test2=

gsearch2= gridsearchcv(estimator = randomforestclassifier(n_estimators= 70,

min_samples_leaf=20 ,oob_score=true,random_state=10),

param_grid = param_test2,scoring='roc_auc',iid=false, cv=5)

gsearch2.fit(x,y)

print '第二次引數優化'

print gsearch2.grid_scores_

print gsearch2.best_params_

print gsearch2.best_score_

param_test3=

gsearch3= gridsearchcv(estimator = randomforestclassifier(n_estimators= 70,max_depth=7,min_samples_split=50,

oob_score=true, random_state=10),

param_grid = param_test3,scoring='roc_auc',iid=false, cv=5)

gsearch3.fit(x,y)

print '第三次引數優化'

print gsearch3.grid_scores_

print gsearch2.best_params_

print gsearch2.best_score_

param_test4=

gsearch4= gridsearchcv(estimator = randomforestclassifier(n_estimators= 70,max_depth=7, min_samples_split=50,

min_samples_leaf=20 ,oob_score=true, random_state=10),

param_grid = param_test4,scoring='roc_auc',iid=false, cv=5)

gsearch4.fit(x,y)

print '第四次引數優化'

print gsearch4.grid_scores_

print gsearch4.best_params_

print gsearch4.best_score_

#使用預設引數

rf_model = randomforestclassifier();

rf_model.fit(x,y)

expected = y

predicted = rf_model.predict(x)

#**結果

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

#使用優化後的引數

new_rf_model = randomforestclassifier(n_estimators=70,min_samples_split=50,max_depth=7,max_features=3);

new_rf_model.fit(x,y)

expected = y

predicted = new_rf_model.predict(x)

#**結果

print(metrics.classification_report(expected,predicted))

print(metrics.confusion_matrix(expected,predicted))

#持久化模型此處的要注意一定要設定compress=3,不然就會很多npy字尾的檔案,是numpy儲存檔案的格

# 式.這個引數貌似是壓縮的

joblib.dump(new_rf_model,r"c:\users\administrator\desktop\temhhhh\rf.model",compress=3)

#載入模型 joblib.load(path)

#將資料劃分為訓練集和測試集

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.5,random_state=66)

print 'x的訓練資料'

print x_train

print 'y的訓練資料'

print y_train

print 'x的測試集'

print x_test

print 'y的測試集'

print y_test

hh_rf = randomforestclassifier()

hh_rf.fit(x_train,y_train)

train_expected = y_train

train_predicted=hh_rf.predict(x_train)

print '訓練效果'

print(metrics.classification_report(train_expected,train_predicted))

print(metrics.confusion_matrix(train_expected,train_predicted))

test_expected = y_test

test_predicted=hh_rf.predict(x_test)

print '**效果'

print(metrics.classification_report(test_expected,test_predicted))

print(metrics.confusion_matrix(test_expected,test_predicted))

隨機森林模型調參方法

列印隨機森林學習器的預設引數配置 1 bootstrap true 2 criterion mse 3 max depth none 4 max features auto 5 max leaf nodes none 6 min impurity decrease 0.0 7 min impurit...

隨機森林 python

這幾天一直在看隨機森林。可以說遇到任何乙個有關的問題。都可以首先隨機森林來進行同時得到的結果也不會太差。在這篇文章裡我首先會向大家推薦幾篇寫的比較好的部落格。接著會將我覺得比較好的例子使用python scikit learn包來實現出來。首先推薦的就是隨機森林入門簡化版老外寫的部落格，...

Python 隨機森林

隨機森林講解文件 scikit learn官方文件 scikit learn的官方文件主要告訴大家如何使用scikit learn包中的類方法來進行隨機森林演算法的其中講的比較好的是各個引數的具體用途。這裡我給出我的理解和部分翻譯 1 sklearn ensemble模組包含了兩個基於隨機決策樹...

python 隨機森林調參 隨機森林調參

隨機森林模型調參方法

隨機森林 python

Python 隨機森林

相關推薦

python 隨機森林調參隨機森林調參