deeplearning系列（六）卷積神經網路

在上一節中實現了乙個簡單的深度神經網路模型，模型的輸入層和第乙個隱藏層包含神經元的數量分別是：28*28和200。神經元的數量相對較少，因此，採用了全連線的設計。然而，在實際視覺應用中，輸入通常是更大尺度的rgb影象，比如96*96*3，若在第乙個隱藏層中學習特徵的數量也是200的話，需要學習引數的數量將達到(96

∗96∗3

+1)∗

200=

5.5∗106

個，相比28*28的影象塊，引數的數量要多35倍，在使用前向傳播和反向傳播計算的過程中，計算時間也會慢35倍。

為了解決這個問題，學者們設計了卷積神經網路（cnn），這種神經網路主要採用下面三種思想：

包含兩個卷積層和乙個池化層的卷積神經網路如下圖所示：

對三通道rgb影象進行卷積的**如下：

function
convolvedfeatures = cnnconvolve
(patchdim, numfeatures, images, w, b, zcawhite, meanpatch)
% parameters:
%  patchdim - patch (feature) dimension
%  numfeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  w, b - w, b for features from the sparse autoencoder
%  zcawhite, meanpatch - zcawhitening and meanpatch matrices used for
%                        preprocessing
% returns:
%  convolvedfeatures - matrix of convolved features in the form
%                      convolvedfeatures(featurenum, imagenum, imagerow, imagecol)
numimages = size(images, 4);
imagedim = size(images, 1);
imagechannels = size(images, 3);
w_new = w*zcawhite;
b_new = b - w*zcawhite*meanpatch;
w_new = reshape(w_new, numfeatures, patchdim*patchdim, imagechannels);
convolvedfeatures = zeros(numfeatures, numimages, imagedim - patchdim + 1, imagedim - patchdim + 1);
for imagenum = 1:numimages
for featurenum = 1:numfeatures
convolvedimage = zeros(imagedim - patchdim + 1, imagedim - patchdim + 1);
for channel = 1:3
feature = reshape(w_new(featurenum,:,channel), patchdim, patchdim);
% flip the feature matrix because of the definition of convolution, as explained later
feature = flipud(fliplr(squeeze(feature)));
im = squeeze(images(:, :, channel, imagenum));
convolvedimage = convolvedimage + conv2(im, feature, 'valid');
endconvolvedimage = convolvedimage + b_new(featurenum);
convolvedimage = 1 ./(1+exp(-convolvedimage));  
convolvedfeatures(featurenum, imagenum, :, :) = convolvedimage;
endendend

其中卷積模板w, b和資料預處理引數zcawhite, meanpatch是從自編碼神經網路中學習得到的，不同的是，自編碼神經網路的輸出層啟用函式從sigmoid改為線性啟用函式。**如下：

% subtract mean patch (hence zeroing the mean of the patches)
meanpatch = mean(patches, 2);  
patches = bsxfun(@minus, patches, meanpatch);
sigma = patches * patches' / numpatches;
[u, s, v] = svd(sigma);
zcawhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';
patches = zcawhite * patches;
% learn features parameter
theta = initializeparameters(hiddensize, visiblesize);
addpath minfunc/
options = struct;
options.method = 'lbfgs'; 
options.maxiter = 400;
options.display = 'on';
[opttheta, cost] = minfunc( @(p) sparseautoencoderlinearcost(p, ...
visiblesize, hiddensize, ...
lambda, sparsityparam, ...
beta, patches), ...
theta, options);
w = reshape(opttheta(1:visiblesize * hiddensize), hiddensize, visiblesize);
b = opttheta(2*hiddensize*visiblesize+1:2*hiddensize*visiblesize+hiddensize);

對卷積後的特徵平均池化，**如下：

function
pooledfeatures = cnnpool
(pooldim, convolvedfeatures)
numimages = size(convolvedfeatures, 2);
numfeatures = size(convolvedfeatures, 1);
convolveddim = size(convolvedfeatures, 3);
pooledfeatures = zeros(numfeatures, numimages, floor(convolveddim / pooldim), floor(convolveddim / pooldim));
%   use mean pooling here.
for imagenum = 1:numimages
for featurenum = 1:numfeatures
fori = 1:floor(convolveddim / pooldim)
forj = 1:floor(convolveddim / pooldim)
poolregion = convolvedfeatures(featurenum, imagenum, ((i-1)*pooldim+1):(i*pooldim), ((j-1)*pooldim+1):(j*pooldim));
poolregion = squeeze(poolregion);
pooledfeatures(featurenum,imagenum, i, j) = mean(mean(poolregion));
endendend
endend

經過卷積和池化後得到的特徵向量是原始輸入影象更緊湊和抽象的表達形式，可以直接用來分類。

參考內容：

1. 2. ning f, delhomme d, lecun y, et al. toward automatic phenotyping of developing embryos from videos. tip, 2005, 14(9): 1360-1371.

deeplearning系列（六）卷積神經網路

Deep learning系列（七）啟用函式

Deep learning系列（七）啟用函式

Deep Learning 學習筆記整理系列之四

deeplearning系列（六）卷積神經網路

Deep learning系列（七）啟用函式

Deep learning系列（七）啟用函式

Deep Learning 學習筆記整理系列之四

相關推薦