同時打亂資料集和標籤的幾種方式

最好先將資料轉換為numpy陣列的格式。

方法一：使用np.random.shuffle

state =np.random.get_state()
np.random.shuffle(train)
np.random.set_state(state)
np.random.shuffle(label)

或者這麼使用：

需要注意的是，如果陣列型別是：['a','b','c','d']，(4,)

我們要先將其轉換為[['a'],['b'],['c'],['d']]，(4,1)

train_row =list(range(len(train_label)))
random.shuffle(train_row)
train_image =train_image[train_row,:]
train_label = train_label[train_row,:]

方法二：使用np.random.permutation()

shuffle_ix =np.random.permutation(np.arange(len(train_data)))
train_data =train_data[shuffle_ix,:]
train_label = train_label[shuffle_ix,:]

方法三：使用pytorch中的dataset，還可以設定batchsize的大小

dataset = torch.utils.data.tensordataset(data, target)      #
設定資料集
train_iter = torch.utils.data.dataloader(dataset, batch_size, shuffle=true) #
設定獲取資料方式

舉個例子：

import
numpy as np
tes = np.array([['
a'],['
b'],['
c'],['d'
]])shuffle_ix =np.random.permutation(len(tes))
shuffle_ix =list(shuffle_ix)
print
(shuffle_ix)
tes = tes[shuffle_ix,:]

[1, 3, 0, 2]

array([['b'],

['d'],

['a'],

['c']], dtype='參考：

打亂資料集的方法

原始資料存在一定的分布規律，所以學習曲線不平滑，如果資料量夠大的話，打亂後會呈現隨機分布，學習後更能體現樣本的共性。為了加強模型的泛化能力，有時候需要打亂資料集包括特徵資料和標籤但是顯然還是要保證每一條資料中的特徵資料和標籤的對應關係可以進行如下操作 1.通過隨機化index import r...

SAS中資料輸入和輸出的幾種方式

sas中資料輸入和輸出的方式有 1.按列輸入 input 變數名起始位置結束位置 data score 建立關於成績的資料集 input name 1 10 math 11 12 chinese 17 18 english 26 27 對每個變數,按列輸入資料 datalines 提示下面是資料...

oracle啟動和關閉資料庫的幾種方式

啟動和關閉資料庫每個資料庫至少包含乙個例程，例程是 oracle 用來管理資料檔案的乙個實體，他在伺服器中，由一組邏輯記憶體結構和一系列後台伺服器程序組成。當啟動資料庫時這些服務和記憶體得到分配。乙個例程只能訪問乙個資料庫，而乙個資料庫可以被多個例程訪問。啟動例程的過程包括讀取引數檔案，或是文字...

同時打亂資料集和標籤的幾種方式

打亂資料集的方法

SAS中資料輸入和輸出的幾種方式

oracle啟動和關閉資料庫的幾種方式

相關推薦