问题描述
任何人都可以通过示例清楚地解释卷积神经网络(在深度学习中)的1D,2D和3D卷积之间的区别吗?
Can anyone please clearly explain the difference between 1D, 2D, and 3D convolutions in convolutional neural networks (in deep learning) with the use of examples?
推荐答案
我想用 C3D 的图片进行解释a>.
I want to explain with picture from C3D.
概括地说,卷积方向& 输出形状很重要!
In a nutshell, convolutional direction & output shape is important!
↑↑↑↑↑ 一维卷积-基本 ↑↑↑↑↑
↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑
- 只需 1 方向(时间轴)即可计算转化
- 输入= [W],过滤器= [k],输出= [W]
- ex)输入= [1,1,1,1,1],过滤器= [0.25,0.5,0.25],输出= [1,1,1,1,1]
- 输出形状是一维数组
- 示例)图形平滑
- just 1-direction (time-axis) to calculate conv
- input = [W], filter = [k], output = [W]
- ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]
- output-shape is 1D array
- example) graph smoothing
import tensorflow as tf
import numpy as np
sess = tf.Session()
ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1
in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)
in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])
input_1d = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)
↑↑↑↑↑ 2D卷积-基本 ↑↑↑↑↑
↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑
- 2 方向(x,y)计算转化率
- 输出形状为 2D 矩阵
- 输入= [W,H],过滤器= [k,k]输出= [W,H]
- 示例) Sobel Egde Fllter
- 2-direction (x,y) to calculate conv
- output-shape is 2D Matrix
- input = [W, H], filter = [k,k] output = [W,H]
- example) Sobel Egde Fllter
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]
in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)
in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])
filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])
input_2d = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])
output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)
↑↑↑↑↑ 3D卷积-基本 ↑↑↑↑↑
↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑
- 3 方向(x,y,z)计算转化率
- 输出形状为 3D 体积
- 输入= [W,H, L ],过滤器= [k,k, d ]输出= [W,H,M]
- d< L 很重要!用于输出音量
- 示例)C3D
- 3-direction (x,y,z) to calcuate conv
- output-shape is 3D Volume
- input = [W,H,L], filter = [k,k,d] output = [W,H,M]
- d < L is important! for making volume output
- example) C3D
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])
input_3d = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])
output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)
↑↑↑↑↑ 具有3D输入的2D卷积 -LeNet,VGG,...,↑↑↑↑↑
↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ..., ↑↑↑↑↑
- 事件输入为3D,例如)224x224x3、112x112x32
- 输出形状不是 3D 体积,而是 2D 矩阵
- 因为过滤器深度= L 必须与输入通道= L 匹配
- 2 方向(x,y)计算转换!不是3D
- 输入= [W,H, L ],过滤器= [k,k, L ]输出= [W,H]
- 输出形状为 2D 矩阵
- 如果我们要训练N个过滤器(N是过滤器数)
- 然后,输出形状为(堆叠2D) 3D = 2D x N 矩阵.
- Eventhough input is 3D ex) 224x224x3, 112x112x32
- output-shape is not 3D Volume, but 2D Matrix
- because filter depth = L must be matched with input channels = L
- 2-direction (x,y) to calcuate conv! not 3D
- input = [W,H,L], filter = [k,k,L] output = [W,H]
- output-shape is 2D Matrix
- what if we want to train N filters (N is number of filters)
- then output shape is (stacked 2D) 3D = 2D x N matrix.
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])
output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)
conv2d-LeNet,VGG ...用于N个过滤器
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])
#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)
↑↑↑↑↑ 在CNN中获得1x1转化奖励 -GoogLeNet,...,↑↑↑↑↑
↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑
- 当您将其视为像sobel这样的2D图像滤镜时,
- 1x1转换会令人困惑
- 对于CNN中的1x1转换,输入为3D形状,如上图所示.
- 它计算深度过滤
- 输入= [W,H,L],过滤器= [1,1,L] 输出= [W,H]
- 输出堆叠形状为 3D = 2D x N 矩阵.
- 1x1 conv is confusing when you think this as 2D image filter like sobel
- for 1x1 conv in CNN, input is 3D shape as above picture.
- it calculate depth-wise filtering
- input = [W,H,L], filter = [1,1,L] output = [W,H]
- output stacked shape is 3D = 2D x N matrix.
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])
#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)
动画(具有3D输入的2D转换)
-原始链接: LINK
-作者:马丁·戈尔纳(MartinGörner)
-Twitter:@martin_gorner
-Google +:plus.google.com/+MartinGorne
Animation (2D Conv with 3D-inputs)
- Original Link : LINK
- The author: Martin Görner
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne
↑↑↑↑↑ 具有1D输入的1D卷积 ↑↑↑↑↑
↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑
↑↑↑↑↑ 具有2D输入的一维卷积 ↑↑↑↑↑
↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑
- 事件输入为2D,例如20x14
- 输出形状不是 2D ,而是 1D 矩阵
- 因为过滤器高度= L 必须与输入高度= L 相匹配
- 1 -方向(x)计算转换!不是2D
- 输入= [W, L ],过滤器= [k, L ]输出= [W]
- 输出形状为 1D 矩阵
- 如果我们要训练N个过滤器(N是过滤器数)
- 然后,输出形状为(堆叠1D) 2D = 1D x N 矩阵.
- Eventhough input is 2D ex) 20x14
- output-shape is not 2D , but 1D Matrix
- because filter height = L must be matched with input height = L
- 1-direction (x) to calcuate conv! not 2D
- input = [W,L], filter = [k,L] output = [W]
- output-shape is 1D Matrix
- what if we want to train N filters (N is number of filters)
- then output shape is (stacked 1D) 2D = 1D x N matrix.
in_channels = 32 # 3, 32, 64, 128, ...
out_channels = 64 # 3, 32, 64, 128, ...
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]
in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)
in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])
input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])
output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)
sess.close()
输入&在Tensorflow中输出
这篇关于对卷积神经网络中1D,2D和3D卷积的直觉理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!