对卷积神经网络中1D，2D和3D卷积的直觉理解

本文介绍了对卷积神经网络中1D，2D和3D卷积的直觉理解的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以通过示例清楚地解释卷积神经网络(在深度学习中)的1D，2D和3D卷积之间的区别吗?

Can anyone please clearly explain the difference between 1D, 2D, and 3D convolutions in convolutional neural networks (in deep learning) with the use of examples?

推荐答案

我想用 C3D 的图片进行解释a>.

I want to explain with picture from C3D.

概括地说，卷积方向& 输出形状很重要！

In a nutshell, convolutional direction & output shape is important!

↑↑↑↑↑ 一维卷积-基本 ↑↑↑↑↑

↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑

只需 1 方向(时间轴)即可计算转化
输入= [W]，过滤器= [k]，输出= [W]
ex)输入= [1,1,1,1,1]，过滤器= [0.25,0.5,0.25]，输出= [1,1,1,1,1]
输出形状是一维数组
示例)图形平滑

just 1-direction (time-axis) to calculate conv
input = [W], filter = [k], output = [W]
ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]
output-shape is 1D array
example) graph smoothing

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

↑↑↑↑↑ 2D卷积-基本 ↑↑↑↑↑

↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑

2 方向(x，y)计算转化率
输出形状为 2D 矩阵
输入= [W，H]，过滤器= [k，k]输出= [W，H]
示例) Sobel Egde Fllter

2-direction (x,y) to calculate conv
output-shape is 2D Matrix
input = [W, H], filter = [k,k] output = [W,H]
example) Sobel Egde Fllter

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑ 3D卷积-基本 ↑↑↑↑↑

↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑

3 方向(x，y，z)计算转化率
输出形状为 3D 体积
输入= [W，H， L ]，过滤器= [k，k， d ]输出= [W，H，M]
d< L 很重要！用于输出音量
示例)C3D

3-direction (x,y,z) to calcuate conv
output-shape is 3D Volume
input = [W,H,L], filter = [k,k,d] output = [W,H,M]
d < L is important! for making volume output
example) C3D

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑ 具有3D输入的2D卷积 -LeNet，VGG，...，↑↑↑↑↑

↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ..., ↑↑↑↑↑

事件输入为3D，例如)224x224x3、112x112x32
输出形状不是 3D 体积，而是 2D 矩阵
因为过滤器深度= L 必须与输入通道= L
2 方向(x，y)计算转换！不是3D
输入= [W，H， L ]，过滤器= [k，k， L ]输出= [W，H]
输出形状为 2D 矩阵
如果我们要训练N个过滤器(N是过滤器数)
然后，输出形状为(堆叠2D) 3D = 2D x N 矩阵.

Eventhough input is 3D ex) 224x224x3, 112x112x32
output-shape is not 3D Volume, but 2D Matrix
because filter depth = L must be matched with input channels = L
2-direction (x,y) to calcuate conv! not 3D
input = [W,H,L], filter = [k,k,L] output = [W,H]
output-shape is 2D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 2D) 3D = 2D x N matrix.

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

conv2d-LeNet，VGG ...用于N个过滤器

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

↑↑↑↑↑ 在CNN中获得1x1转化奖励 -GoogLeNet，...，↑↑↑↑↑

↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑

1x1转换会令人困惑
对于CNN中的1x1转换，输入为3D形状，如上图所示.
它计算深度过滤
输入= [W，H，L]，过滤器= [1,1，L] 输出= [W，H]
输出堆叠形状为 3D = 2D x N 矩阵.

1x1 conv is confusing when you think this as 2D image filter like sobel
for 1x1 conv in CNN, input is 3D shape as above picture.
it calculate depth-wise filtering
input = [W,H,L], filter = [1,1,L] output = [W,H]
output stacked shape is 3D = 2D x N matrix.

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

动画(具有3D输入的2D转换)

-原始链接: LINK
-作者:马丁·戈尔纳(MartinGörner)
-Twitter:@martin_gorner
-Google +:plus.google.com/+MartinGorne

Animation (2D Conv with 3D-inputs)

- Original Link : LINK
- The author: Martin Görner
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne

↑↑↑↑↑ 具有1D输入的1D卷积 ↑↑↑↑↑

↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑

↑↑↑↑↑ 具有2D输入的一维卷积 ↑↑↑↑↑

↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑

事件输入为2D，例如20x14
输出形状不是 2D ，而是 1D 矩阵
因为过滤器高度= L 必须与输入高度= L
1 -方向(x)计算转换！不是2D
输入= [W， L ]，过滤器= [k， L ]输出= [W]
输出形状为 1D 矩阵
如果我们要训练N个过滤器(N是过滤器数)
然后，输出形状为(堆叠1D) 2D = 1D x N 矩阵.

Eventhough input is 2D ex) 20x14
output-shape is not 2D , but 1D Matrix
because filter height = L must be matched with input height = L
1-direction (x) to calcuate conv! not 2D
input = [W,L], filter = [k,L] output = [W]
output-shape is 1D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 1D) 2D = 1D x N matrix.

in_channels = 32 # 3, 32, 64, 128, ...
out_channels = 64 # 3, 32, 64, 128, ...
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])

filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

输入&在Tensorflow中输出

这篇关于对卷积神经网络中1D，2D和3D卷积的直觉理解的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！