深度可分離卷積

下面這個文章介紹了深度可分離卷積是怎麼做的：

本文的很多內容都是在這兩個文章的基礎上整理的。

卷積基礎

描述乙個二維矩陣，使用row col。三維的，使用channel row col。四維則多了乙個引數：batch channel row col。batch channel row col的邏輯順序則和資料格式有關，常見的有nhwc和nchw：

2d卷積

2d卷積只有col row的概念。（略）

3d卷積和4d卷積

我們先看3d卷積。

假設過濾器視窗是3x3x3（其中乙個3代表了in_depth）。有四個這樣的視窗，用於提取同乙個的四個屬性(out_depth指定，對應輸出out channel 0…3)。那麼，針對裡面的某個batch（譬如batch 0），其處理流程如下：

真正的原始碼實現，i，j代表了輸出的某個屬性的任意位置的值。這個值，是視窗和輸入卷積得來的。

def conv2d_multi_channel(input, w):
"""two-dimensional convolution with multiple channels.
uses same padding with 0s, a stride of 1 and no dilation.
input: input array with shape (height, width, in_depth)
w: filter array with shape (fd, fd, in_depth, out_depth) with odd fd.
in_depth is the number of input channels, and has the be the same as
input's in_depth; out_depth is the number of output channels.
returns a result with shape (height, width, out_depth).
"""assert w.shape[0] == w.shape[1] and w.shape[0] % 2 == 1
padw = w.shape[0] // 2
padded_input = np.pad(input,
pad_width=((padw, padw), (padw, padw), (0, 0)),
mode='constant',
constant_values=0)
height, width, in_depth = input.shape
assert in_depth == w.shape[2]
out_depth = w.shape[3]
output = np.zeros((height, width, out_depth))
for out_c in range(out_depth):
# for each output channel, perform 2d convolution summed across all
# input channels.
for i in range(height):
for j in range(width):
# now the inner loop also works across all input channels.
for c in range(in_depth):
#下面這段應該封裝為乙個新的函式：用於求解輸出的某個屬性的卷積。
for fi in range(w.shape[0]):
for fj in range(w.shape[1]):
w_element = w[fi, fj, c, out_c]
output[i, j, out_c] += (
padded_input[i + fi, j + fj, c] * w_element)
return output

所謂4d，就是對每個batch重複上面的過程。

參考文獻：

正常卷積

原始影象是二維的，大小是12x12。由於是rgb格式的，所以有三個通道，這相當於是乙個3維的。其輸入格式是：12x12x3。濾波器視窗大小是5x5x3。這樣的話，得到的輸出影象大小是8x8x1（padding模式是valid）。

正常卷積的問題在於，它的卷積核是針對的所有通道設計的（通道的總數就是depth）。那麼，每要求增加檢測的乙個屬性，卷積核就要增加乙個。所以正常卷積，卷積引數的總數=屬性的總數x卷積核的大小。

深度可分離卷積

深度可分離卷積的方法有所不同。正常卷積核是對3個通道同時做卷積。也就是說，3個通道，在一次卷積後，輸出乙個數。

深度可分離卷積分為兩步：

第一步用三個卷積對三個通道分別做卷積，這樣在一次卷積後，輸出3個數。

這輸出的三個數，再通過乙個1x1x3的卷積核（pointwise核），得到乙個數。

所以深度可分離卷積其實是通過兩次卷積實現的。

第一步，對三個通道分別做卷積，輸出三個通道的屬性：

第二步，用卷積核1x1x3對三個通道再次做卷積，這個時候的輸出就和正常卷積一樣，是8x8x1：

如果要提取更多的屬性，則需要設計更多的1x1x3卷積核心就可以(引用自原**。感覺應該將8x8x256那個立方體繪製成256個8x8x1，因為他們不是一體的，代表了256個屬性)：

可以看到，如果僅僅是提取乙個屬性，深度可分離卷積的方法，不如正常卷積。隨著要提取的屬性越來越多，深度可分離卷積就能夠節省更多的引數。

證明過程

有一篇文章證明了深度可分離卷積和正常卷積是等效的（如果有需要的話，我再整理）：

引數的選擇過程

深度可分離卷積

深度可分離卷積

深度可分離卷積

深度可分離卷積 總結

相關推薦

深度可分離卷積總結