KFold與StratifiedKFold 的區別

stratifiedkfold用法類似kfold，但是他是分層取樣，確保訓練集，測試集中各類別樣本的比例與原始資料集中相同。

import numpy as np 
from sklearn.model_selection import kfold,stratifiedkfold
x=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
])y=np.array([1,1,0,0,1,1,0,0])
floder = kfold(n_splits=4,random_state=0,shuffle=false)
sfolder = stratifiedkfold(n_splits=4,random_state=0,shuffle=false)
for train, test in sfolder.split(x,y):
print('train: %s | test: %s' % (train, test))
print(" ")
for train, test in floder.split(x,y):
print('train: %s | test: %s' % (train, test))
print(" ")

結果如下：

train: [1 3 4 5 6 7] | test: [0 2]
train: [0 2 4 5 6 7] | test: [1 3]
train: [0 1 2 3 5 7] | test: [4 6]
train: [0 1 2 3 4 6] | test: [5 7]
train: [2 3 4 5 6 7] | test: [0 1]
train: [0 1 4 5 6 7] | test: [2 3]
train: [0 1 2 3 6 7] | test: [4 5]
train: [0 1 2 3 4 5] | test: [6 7]

從結果中我們可以看出，stratifiedkfold 分層取樣交叉切分，確保訓練集，測試集中各類別樣本的比例與原始資料集中相同。

參考：

機器學習 KFold交叉驗證

section i brief introduction on stratifiedkfold from sebastian raschka,vahid mirjalili.python機器學習第二版.南京東南大學出版社，2018.section ii code and analyses from...

Sklearn中交叉驗證 KFold

在機器學習建模過程中，將資料分為訓練集和測試集。測試集合訓練集是完全分開的兩個資料集，完全不參與訓練，只是用於模型最終確定後，來測試模型的效果。而訓練集又要分出一部分資料用來驗證模型的訓練效果，即驗證集。驗證集在每次訓練集訓練結束後，對模型的效果進行初步地測試。之所以要設定驗證集，是因為訓練資料會有...

K折交叉驗證的使用之KFold和split函式

使用方法匯入方式 from sklearn.model selection import kfoldkfold n split,random state,shuffle 引數 skf kfold n splits 10,random state 233,shuffle true kfold 方法中...

KFold與StratifiedKFold 的區別

機器學習 KFold交叉驗證

Sklearn中交叉驗證 KFold

K折交叉驗證的使用之KFold和split函式

相關推薦