增加GRU网络的容量(Hermans和Schrauwen 2013),循环层可以堆叠在彼此.由于GRU没有两个输出状态,因此相同的输出隐藏状态h'2被传递到下一个垂直层.换句话说,下一层的h1将等于h'2.这迫使GRU学习沿深度和时间有用的变换.I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.Please help! 解决方案 TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t)In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).From https://arxiv.org/pdf/1710.02254.pdf:To increase the capacity of GRU networks (Hermans andSchrauwen 2013), recurrent layers can be stacked on top ofeach other.Since GRU does not have two output states, the same output hidden state h'2is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2.This forces GRU to learn transformations that are useful along depth as well as time. 这篇关于了解多层LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 09-03 07:46