问题描述
作为背景,我在机器学习领域相对较新,我正在尝试一个旨在对NBA游戏中的比赛进行分类的项目.我的输入是NBA比赛中每场比赛的40帧序列,我的标签是给定比赛的11种无所不包的分类.
As context, I am relatively new to the world of machine learning and I am attempting a project with a goal of classifying plays in an NBA game. My inputs are a sequence of 40 frames from each play in an NBA game and my labels are 11 all-encompassing classifications for a given play.
计划是获取每个帧序列,并将每个帧传递到CNN中以提取一组特征.然后,来自给定视频的每个功能序列都将传递到RNN.
The plan is to take each sequence of frames and pass each frame into a CNN to extract a set of features. Then each sequence of features from a given video would be passed onto an RNN.
我目前大部分实施中都使用Keras,因此我选择对CNN使用VGG16模型.这是下面的一些相关代码:
I am currently using Keras for most of my implementation and I chose to use a VGG16 model for my CNN. Here is some of the relevant code below:
video = keras.Input(shape = (None, 255, 255, 3), name = 'video')
cnn = keras.applications.VGG16(include_top=False, weights = None, input_shape=
(255,255,3), pooling = 'avg', classes=11)
cnn.trainable = True
我的问题是-如果我的目标是对NBA游戏的视频片段进行分类,那么将VGG16 ConvNet的权重初始化为"imagenet"是否仍然对我有好处?如果是这样,为什么?如果没有,如何训练VGG16 ConvNet获得自己的权重,然后如何将其插入此功能?我几乎没有运气找到任何使用VGG16模型时有人包含自己的权重的教程.
My question is - would it still be beneficial for me to initialize the weights of the VGG16 ConvNet to 'imagenet' if my goal is to classify video clips of NBA games? If so, why? If not, how can I train the VGG16 ConvNet to get my own set of weights and then how can I insert them into this function? I have had little luck finding any tutorials where someone included their own set of weights when using the VGG16 model.
如果我的问题看起来很幼稚,我深表歉意,但是如果能解决此问题,我将不胜感激.
I apologize if my questions seem naive but I would really appreciate any help in clearing this up.
推荐答案
您应该针对您的特定任务重新培训VGG16吗? 绝对不是!训练如此庞大的网络非常困难,并且在训练深度网络时需要大量的直觉和知识.让我们分析一下为什么可以使用ImageNet上预先训练的权重来完成任务:
Should you retrain VGG16 for your specific task? Absolutely not! Retraining such a huge network is hard, and requires lots of intuition and knowledge in training deep networks. Let's analyze why you can use the weights, pre-trained on ImageNet, for your task:
-
ImageNet是一个巨大的数据集,包含数百万个图像. VGG16本身经过3-4天左右的训练,都使用了功能强大的GPU.在CPU上(假设您没有像NVIDIA GeForce Titan X那样强大的GPU)将花费数周的时间.
ImageNet is a huge dataset, containing of millions of images. VGG16 itself has been trained in 3-4 days or so on a powerful GPU. On CPU (assuming that you don't have a GPU as powerful as NVIDIA GeForce Titan X) would take weeks.
ImageNet包含来自真实场景的图像. NBA游戏也可以视为现实世界.因此,很有可能在ImageNet功能上进行预训练也可以用于NBA游戏.
ImageNet contains images from real-world scenes. NBA games can also be considered as real-world scenes. So, it is very likely that pre-trained on ImageNet features can be used for NBA games, too.
实际上,您不需要使用所有经过预训练的VGG16的卷积层.让我们看一下内部VGG16层的可视化,看看它们检测到的内容(取自本文;图像太大,所以我只提供了一个紧凑性链接):
Actually, you don't need to use all convolutional layers of pre-trained VGG16. Let's take a look at the visualization of internal VGG16 layers and see what they detect (taken from this article; the image is too large, so I put just a link for compactness):
- 第一个和第二个卷积块着眼于底层特征,例如拐角,边缘等.
- 第三和第四卷积块着眼于表面特征,曲线,圆等.
- 第五层着眼于高级功能
因此,您可以决定哪种类型的功能将对您的特定任务有所帮助.您在第5个区块需要高级功能吗?或者您可能想使用3rd块的中级功能?也许您想在VGG底层的顶部堆叠另一个神经网络?有关更多说明,请参见我编写的以下教程.它曾经在SO Documentation上发布.
So, you can decide which kind of features will be beneficial for your specific task. Do you need high level features at 5th block? Or you might want to use mid-level features of 3rd block? Maybe you want to stack another neural network on top of bottom layers of VGG? For more instruction, take a look at the following tutorial which I wrote; it was once on SO Documentation.
在此示例中,给出了三个简短而全面的子示例:
In this example, three brief and comprehensive sub-examples are presented:
- 从可用的预训练模型加载权重,这些模型已包含在 Keras 库 中
- 在VGG的任何层之上堆叠另一个网络进行培训
- 在其他层中间插入一个层
- 使用VGG进行微调和转移学习的技巧和一般的经验法则
- Loading weights from available pre-trained models, included with Keras library
- Stacking another network for training on top of any layers of VGG
- Inserting a layer in the middle of other layers
- Tips and general rule-of-thumbs for Fine-Tuning and transfer learning with VGG
在 ImageNet 模型上进行了预训练,包括 VGG-16 和 VGG-19 ,可在 Keras .在本示例的此处和之后,将使用 VGG-16 .有关更多信息,请访问 Keras Applications文档 .
Pre-trained on ImageNet models, including VGG-16 and VGG-19, are available in Keras. Here and after in this example, VGG-16 will be used. For more information, please visit Keras Applications documentation.
from keras import applications
# This will load the whole VGG16 network, including the top Dense layers.
# Note: by specifying the shape of top layers, input tensor shape is forced
# to be (224, 224, 3), therefore you can use it only on 224x224 images.
vgg_model = applications.VGG16(weights='imagenet', include_top=True)
# If you are only interested in convolution filters. Note that by not
# specifying the shape of top layers, the input tensor shape is (None, None, 3),
# so you can use them for any size of images.
vgg_model = applications.VGG16(weights='imagenet', include_top=False)
# If you want to specify input tensor
from keras.layers import Input
input_tensor = Input(shape=(160, 160, 3))
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_tensor=input_tensor)
# To see the models' architecture and layer names, run the following
vgg_model.summary()
使用从VGG提取的底层创建一个新的网络
假定对于尺寸为(160, 160, 3)
的图像的某些特定任务,您要使用经过预先训练的VGG底层,直到名称为block2_pool
的层.
Create a new network with bottom layers taken from VGG
Assume that for some specific task for images with the size (160, 160, 3)
, you want to use pre-trained bottom layers of VGG, up to layer with the name block2_pool
.
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_shape=(160, 160, 3))
# Creating dictionary that maps layer names to the layers
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])
# Getting output tensor of the last VGG layer that we want to include
x = layer_dict['block2_pool'].output
# Stacking a new simple convolutional network on top of it
x = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(10, activation='softmax')(x)
# Creating new model. Please note that this is NOT a Sequential() model.
from keras.models import Model
custom_model = Model(input=vgg_model.input, output=x)
# Make sure that the pre-trained bottom layers are not trainable
for layer in custom_model.layers[:7]:
layer.trainable = False
# Do not forget to compile it
custom_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
删除多层并在中间插入新层
假设您需要通过用单个卷积层替换block1_conv1
和block2_conv2
来加快VGG16的速度,以节省预训练的权重.想法是将整个网络分解为单独的层,然后将其重新组装.这是专门用于您的任务的代码:
Remove multiple layers and insert a new one in the middle
Assume that you need to speed up VGG16 by replacing block1_conv1
and block2_conv2
with a single convolutional layer, in such a way that the pre-trained weights are saved.The idea is to disassemble the whole network to separate layers, then assemble it back. Here is the code specifically for your task:
vgg_model = applications.VGG16(include_top=True, weights='imagenet')
# Disassemble layers
layers = [l for l in vgg_model.layers]
# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
new_conv = Conv2D(filters=64,
kernel_size=(5, 5),
name='new_conv',
padding='same')(layers[0].output)
# Now stack everything back
# Note: If you are going to fine tune the model, do not forget to
# mark other layers as un-trainable
x = new_conv
for i in range(3, len(layers)):
layers[i].trainable = False
x = layers[i](x)
# Final touch
result_model = Model(input=layer[0].input, output=x)
这篇关于我需要Keras VGG16的预训练砝码吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!