在上一節中實現了乙個簡單的深度神經網路模型,模型的輸入層和第乙個隱藏層包含神經元的數量分別是:28*28和200。神經元的數量相對較少,因此,採用了全連線的設計。然而,在實際視覺應用中,輸入通常是更大尺度的rgb影象,比如96*96*3,若在第乙個隱藏層中學習特徵的數量也是200的話,需要學習引數的數量將達到(96
∗96∗3
+1)∗
200=
5.5∗106
個,相比28*28的影象塊,引數的數量要多35倍,在使用前向傳播和反向傳播計算的過程中,計算時間也會慢35倍。
為了解決這個問題,學者們設計了卷積神經網路(cnn),這種神經網路主要採用下面三種思想:
包含兩個卷積層和乙個池化層的卷積神經網路如下圖所示:
對三通道rgb影象進行卷積的**如下:
function
convolvedfeatures = cnnconvolve
(patchdim, numfeatures, images, w, b, zcawhite, meanpatch)
% parameters:
% patchdim - patch (feature) dimension
% numfeatures - number of features
% images - large images to convolve with, matrix in the form
% images(r, c, channel, image number)
% w, b - w, b for features from the sparse autoencoder
% zcawhite, meanpatch - zcawhitening and meanpatch matrices used for
% preprocessing
% returns:
% convolvedfeatures - matrix of convolved features in the form
% convolvedfeatures(featurenum, imagenum, imagerow, imagecol)
numimages = size(images, 4);
imagedim = size(images, 1);
imagechannels = size(images, 3);
w_new = w*zcawhite;
b_new = b - w*zcawhite*meanpatch;
w_new = reshape(w_new, numfeatures, patchdim*patchdim, imagechannels);
convolvedfeatures = zeros(numfeatures, numimages, imagedim - patchdim + 1, imagedim - patchdim + 1);
for imagenum = 1:numimages
for featurenum = 1:numfeatures
convolvedimage = zeros(imagedim - patchdim + 1, imagedim - patchdim + 1);
for channel = 1:3
feature = reshape(w_new(featurenum,:,channel), patchdim, patchdim);
% flip the feature matrix because of the definition of convolution, as explained later
feature = flipud(fliplr(squeeze(feature)));
im = squeeze(images(:, :, channel, imagenum));
convolvedimage = convolvedimage + conv2(im, feature, 'valid');
endconvolvedimage = convolvedimage + b_new(featurenum);
convolvedimage = 1 ./(1+exp(-convolvedimage));
convolvedfeatures(featurenum, imagenum, :, :) = convolvedimage;
endendend
其中卷積模板w, b和資料預處理引數zcawhite, meanpatch是從自編碼神經網路中學習得到的,不同的是,自編碼神經網路的輸出層啟用函式從sigmoid改為線性啟用函式。**如下:
% subtract mean patch (hence zeroing the mean of the patches)
meanpatch = mean(patches, 2);
patches = bsxfun(@minus, patches, meanpatch);
sigma = patches * patches' / numpatches;
[u, s, v] = svd(sigma);
zcawhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';
patches = zcawhite * patches;
% learn features parameter
theta = initializeparameters(hiddensize, visiblesize);
addpath minfunc/
options = struct;
options.method = 'lbfgs';
options.maxiter = 400;
options.display = 'on';
[opttheta, cost] = minfunc( @(p) sparseautoencoderlinearcost(p, ...
visiblesize, hiddensize, ...
lambda, sparsityparam, ...
beta, patches), ...
theta, options);
w = reshape(opttheta(1:visiblesize * hiddensize), hiddensize, visiblesize);
b = opttheta(2*hiddensize*visiblesize+1:2*hiddensize*visiblesize+hiddensize);
對卷積後的特徵平均池化,**如下:
function
pooledfeatures = cnnpool
(pooldim, convolvedfeatures)
numimages = size(convolvedfeatures, 2);
numfeatures = size(convolvedfeatures, 1);
convolveddim = size(convolvedfeatures, 3);
pooledfeatures = zeros(numfeatures, numimages, floor(convolveddim / pooldim), floor(convolveddim / pooldim));
% use mean pooling here.
for imagenum = 1:numimages
for featurenum = 1:numfeatures
fori = 1:floor(convolveddim / pooldim)
forj = 1:floor(convolveddim / pooldim)
poolregion = convolvedfeatures(featurenum, imagenum, ((i-1)*pooldim+1):(i*pooldim), ((j-1)*pooldim+1):(j*pooldim));
poolregion = squeeze(poolregion);
pooledfeatures(featurenum,imagenum, i, j) = mean(mean(poolregion));
endendend
endend
經過卷積和池化後得到的特徵向量是原始輸入影象更緊湊和抽象的表達形式,可以直接用來分類。
參考內容:
1. 2. ning f, delhomme d, lecun y, et al. toward automatic phenotyping of developing embryos from videos. tip, 2005, 14(9): 1360-1371.
Deep learning系列(七)啟用函式
sigmoid將乙個實數輸入對映到 0,1 範圍內,如下圖 左 所示。使用sigmoid作為啟用函式存在以下幾個問題 因為上面兩個問題的存在,導致引數收斂速度很慢,嚴重影響了訓練的效率。因此在設計神經網路時,很少採用sigmoid啟用函式。tanh函式將乙個實數輸入對映到 1,1 範圍內,如上圖 右...
Deep learning系列(七)啟用函式
sigmoid將乙個實數輸入對映到 0,1 範圍內,如下圖 左 所示。使用sigmoid作為啟用函式存在以下幾個問題 因為上面兩個問題的存在,導致引數收斂速度很慢,嚴重影響了訓練的效率。因此在設計神經網路時,很少採用sigmoid啟用函式。tanh函式將乙個實數輸入對映到 1,1 範圍內,如上圖 右...
Deep Learning 學習筆記整理系列之四
九 deep learning的常用模型或者方法 9.1 autoencoder自動編碼器 deep learning最簡單的一種方法是利用人工神經網路的特點,人工神經網路 ann 本身就是具有層次結構的系統,如果給定乙個神經網路,我們假設其輸出與輸入是相同的,然後訓練調整其引數,得到每一層中的權重...