问题描述
在MNIST LSTM示例中,我不理解隐藏层"的含义.当您表示一段时间后展开的RNN时,它是虚构层吗?
In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time?
为什么在大多数情况下num_units = 128
是?
Why is the num_units = 128
in most cases ?
推荐答案
隐藏单元数直接表示神经网络的学习能力-它反映了学习参数的数量 .值128
可能是任意或经验选择的.您可以实验性地更改该值,然后重新运行程序以查看它如何影响训练精度(隐藏的单元数很多可以达到90%的测试精度).使用更多的单位可以更好地记住完整的训练集(尽管这将花费更长的时间,并且存在过度拟合的风险).
The number of hidden units is a direct representation of the learning capacity of a neural network -- it reflects the number of learned parameters. The value 128
was likely selected arbitrarily or empirically. You can change that value experimentally and rerun the program to see how it affects the training accuracy (you can get better than 90% test accuracy with a lot fewer hidden units). Using more units makes it more likely to perfectly memorize the complete training set (although it will take longer, and you run the risk of over-fitting).
要理解的关键点,在著名的 Colah的博客中有些微妙post (找到每行包含一个完整的向量" ),是因为 X
是数据的数组 (如今通常称为 张量 )-但这并不意味着标量值.例如,在显示tanh
函数的地方,这意味着暗示该函数在整个数组中进行 broadcast 广播(隐式的for
循环),而不是简单地每个函数执行一次时间步长.
The key thing to understand, which is somewhat subtle in the famous Colah's blog post (find "each line carries an entire vector"), is that X
is an array of data (nowadays often called a tensor) -- it is not meant to be a scalar value. Where, for example, the tanh
function is shown, it is meant to imply that the function is broadcast across the entire array (an implicit for
loop) -- and not simply performed once per time-step.
这样,隐藏单元表示网络内的有形存储,主要体现在 weights 数组的大小上.而且,由于LSTM实际上确实有一部分自己的内部存储空间与学习的模型参数分开,因此它必须知道有多少个单位-最终需要与权重的大小一致.在最简单的情况下,RNN没有内部存储-因此,它甚至不需要预先知道要应用多少隐藏单位".
As such, the hidden units represent tangible storage within the network, which is manifest primarily in the size of the weights array. And because an LSTM actually does have a bit of it's own internal storage separate from the learned model parameters, it has to know how many units there are -- which ultimately needs to agree with the size of the weights. In the simplest case, an RNN has no internal storage -- so it doesn't even need to know in advance how many "hidden units" it is being applied to.
- A good answer to a similar question here.
- You can look at the source for BasicLSTMCell in TensorFlow to see exactly how this is used.
旁注:此符号在统计和机器学习中非常普遍,以及使用通用公式处理大量数据的其他字段(另一个示例是3D图形).对于希望看到他们的for
循环被明确写出的人来说,这需要一点时间来适应.
Side note: This notation is very common in statistics and machine-learning, and other fields that process large batches of data with a common formula (3D graphics is another example). It takes a bit of getting used to for people who expect to see their for
loops written out explicitly.
这篇关于张量流BasicLSTMCell中的num_units是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!