问题描述
我在理解使用 TensorFlow 设置卷积神经网络时使用的张量的维度时遇到了一些麻烦.例如,在 this 教程,28x28 MNIST 图像表示如下:
I am having a bit of trouble understanding the dimensions of the tensors used in the set up of convolutional neural networks using TensorFlow. For example, in this tutorial, the 28x28 MNIST images are represented like this:
import TensorFlow as tf
x = tf.placeholder(tf.float32, shape=[None, 784])
x_image = tf.reshape(x, [-1,28,28,1])
假设我有十张训练图像,上面的重塑使我的输入 x_image
成为一个包含 28 个 28 维列向量的十个子集合的集合.
Assuming I have ten training images, the reshaping above makes my input x_image
a collection of ten sub-collections of twenty-eight 28-dimensional column vectors.
使用起来似乎更自然
x_image_natural = tf.reshape(x, [-1,28,28])
相反,它将返回十个 28x28 矩阵.
instead, which would return ten 28x28 matrices.
插图:
a = np.array(range(8))
opt1 = a.reshape(-1,2,2,1)
opt2 = a.reshape(-1,2,2)
print opt1
print opt2
# opt1 - column vectors
>>[[[[0]
>>[1]]
>>[[2]
>>[3]]]
>>[[[4]
>>[5]]
>>[[6]
>>[7]]]]
# opt2 - matrices
>>[[[0 1]
>>[2 3]]
>>[[4 5]
>>[6 7]]]
与此类似,是否有一种直观的方法可以理解为什么卷积层具有尺寸(height_of_patch, width_of_patch, num_input_layers, num_output_layers)
?转置看起来更直观,因为它最终是一个补丁大小的矩阵的集合.
In a similar vein, is there an intuitive way to understand why the convolutional layers have dimensions (height_of_patch, width_of_patch, num_input_layers, num_output_layers)
? The transpose, seems more intuitive, in that it is ultimately a collection of patch-sized matrices.
* 编辑 *
我实际上很好奇为什么张量的维度按照它们的顺序排列.
I'm actually curious about why the dimensions of the tensors are ordered they way they are.
对于输入,X,我们为什么不使用
For the inputs, X, why don't we use
x_image = tf.reshape(x, [-1,i,28,28])
这将创建 batch_size、i
大小的 28x28 矩阵数组(其中 i
是输入层的数量)?
which would create batch_size, i
-sized arrays of 28x28 matrices (where i
is the number of input layers)?
同样,为什么权重张量的形状不是 (num_output_layers, num_input_layers, input_height, input_width)
(这看起来更直观,因为它是补丁矩阵"的集合.)
Similarly, why aren't the weight tensors shaped like (num_output_layers, num_input_layers, input_height, input_width)
(which again seems more intuitive in that it is a collection of 'patch matrices.')
推荐答案
一层 2-D 卷积的工作方式是通过在输入上滑动 2D 窗口/过滤器/补丁来计算特征图".放入这个 MNIST 数据集的上下文中,输入是灰度图像,因此它们的维度为 [height, width, num_channels] ([28, 28, 1]).假设您决定使用 3x3 窗口/过滤器/补丁,这决定了该卷积层权重的前两个维度(height_of_path=3,width_of_path=3).这样做的原因是为了共享神经元并保持统计不变性(无论鸟出现在图片中的哪个位置,它仍然是一只鸟),另外,它也带来了一些降低计算量的好处.每个通道/深度被认为携带独特的信息(在 RGB 通道的情况下,R=255 和 G=255 表示完全不同的东西),我们不想在不同的深度/通道上共享神经元.因此,卷积层权重的第三个维度与输入的深度维度相同(在 MNIST 情况下,第一个卷积层中的 num_input_layers=1).卷积层权重的最后一个维度是用户可以决定的超参数.这个数字决定了在这个卷积层之后产生了多少特征图.并且值越大,计算成本越高.
The way that one layer of 2-D convolution works is by sliding a 2D window/filter/patch across the input to compute "feature maps". Put into the context of this MNIST dataset, the inputs are grayscale images, hence they are in the dimension of [height, width, num_channels] ([28, 28, 1]). Assume you decide to use a 3x3 window/filter/patch, this determines the first two dimensions of the weights of this convolution layer (height_of_path=3, width_of_path=3). The reason of doing this sliding across height and width dimension is for sharing neurons and to preserve statistical invariance (a bird is still a bird no matter where it appears in the picture), additionally, it also brings some benefit in lowering computation. Each channel/depth is thought as carrying unique information (in the RGB channel case, R=255 and G=255 say completely different things) and we do not wanna share neurons across different depth/channels. Hence the third dimension of the weights of a convolution layer is identical as the inputs' depth dimension (num_input_layers=1 in the first convolution layer in the MNIST case). The last dimension of the weights of a convolution layer is a hyper-parameter that user gets to decide. This number determines how many feature maps are produced after this convolution layer. And the bigger the value, the higher the computation costs.
快速总结.对于任何二维卷积层,假设它接收维度为:
A quick summary. For any 2D convolution layer, assuming it receives input X with dimension of:
X - [batch_size, input_height, input_width, input_depth]
X - [batch_size, input_height, input_width, input_depth]
那么这个卷积层的权重 w 的维度为:
Then the weights w of this convolution layer would have a dimension of:
w - [filter_height, filter_width, input_depth, output_depth]
w - [filter_height, filter_width, input_depth, output_depth]
这个卷积层输出一个 y 维度:
This convolution layer outputs a y in dimension of:
y - [batch_size, output_height, output_width, output_depth]
y - [batch_size, output_height, output_width, output_depth]
通常 ppl 使 filter_height=filter_width,并且经常设置 filter_height=3, 5, 7. output_depth 是用户可以决定的超参数.output_height 和 output_width 根据 input_height、input_weight、filter_height、filter_width、滑动选择和填充选择等确定.
Typically ppl make filter_height=filter_width, and often set filter_height=3, 5, 7. output_depth is a hyperparameter that user gets to decide. The output_height and output_width are determined based on the input_height, input_weight, filter_height, filter_width, the sliding choice and padding choice, etc.
有关更多信息,我建议您阅读 Stanford CS231 关于 ConvNet 的说明,我个人觉得它解释得非常清楚和有见地.
For more information, I'd encourage reading the Stanford CS231 notes on ConvNet, I personally find it very clearly and insightfully explained.
维度的顺序
就维度的顺序而言,据我所知,它更像是一种约定,而不是对"或错".对于一个示例输入,我认为按照 [高度、宽度、通道/深度] 的顺序对其尺寸进行排序是很直观的.事实上,您可以简单地将具有此维度顺序的样本矩阵粘贴到 import matplotlib.pyplot as plt; 中.plt.imhow(sample_matrix)
绘制人眼友好的图像.我认为前三个重量维度顺序遵循[高度,宽度,深度]的常规顺序.我推测这种一致性使得执行卷积操作变得容易,因为我读到这一步的常见实现之一是将 3D 张量展平为 2D 并在下面使用矩阵乘法库.我想你可以将维度的顺序更改为你想要的方式,只要实际计算 btw 维度正确完成.
As far as the order of the dimension goes, to my knowledge, it's more of a convention instead of "right" or "wrong". For one sample input, I think it's intuitive to order its dimension in the order of [height, width, channels/depth]. As a matter of fact, you can simply stick a sample matrix with this order of dimension into import matplotlib.pyplot as plt; plt.imhow(sample_matrix)
to plot a human-eye-friendly image. I think the first three weight dimension order follows the conventional order of [height, width, depth]. I speculate that this consistency makes it easy to perform the convolution operation, as I read that one of the common implementation of this step is to flatten the 3D tensor into 2D and use matrix multiplication libraries underneath. I imagine you can change the order of the dimension into the way you want it to be as long as the actual computation btw dimensions are done correctly.
这篇关于为什么卷积神经网络的张量维度是给定的?- TensorFlow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!