本文介绍了您如何将视频功能从CNN传递到LSTM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将视频帧通过卷积网络并获得输出特征图后,如何将数据传递到LSTM?另外,如何通过CNN将多个帧传递到LSTM?
在其他作品中,我想使用CNN处理视频帧以获取空间特征.然后,我想将这些特征传递给LSTM,以对空间特征进行时间处理.如何将LSTM连接到视频功能?例如,如果输入视频为56x56,然后在通过所有CNN层时,则说它的输出为20:5x5.这些如何逐帧连接到LSTM?他们应该首先经过一个完全连接的层吗?谢谢,乔恩

After you pass a video frame through a convnet and get an output feature map, how do you pass that data into an LSTM? Also, how do you pass multiple frames to the LSTM thru the CNN?
In other works I want to process video frames with an CNN to get the spatial features. Then I want pass these features to an LSTM to do temporal processing on the spatial features. How do I connect the LSTM to the video features? For example if the input video is 56x56 and then when passed through all of the CNN layers, say it comes out as 20: 5x5's. How are these connected to the LSTM on a frame by frame basis? ANd shoudl they go through a fully connected layer first?Thanks, Jon

推荐答案

基本上,您可以展平每个框架特征并将其馈送到一个LSTM单元中.与CNN一样.您可以将CNN的每个输出馈送到一个LSTM单元中.

Basically, you can flatten each frame features and feed them into one LSTM cell. With CNN, it's the same. You can feed each output of CNN into one LSTM cell.

对于FC,由您自己决定.

For FC, it's up to you.

http://www中查看网络结构.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.pdf .

这篇关于您如何将视频功能从CNN传递到LSTM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 02:38