python - Keras:什么是VGG16中的model.inputs

我最近开始使用keras和vgg16，并且正在使用keras.applications.vgg16。

但是在这里，我有一个关于什么是model.inputs的问题，因为我看到其他人在https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py中使用了它，尽管它没有初始化它。

    ...
    input_img = model.input
    ...
    layer_output = layer_dict[layer_name].output
    if K.image_data_format() == 'channels_first':
        loss = K.mean(layer_output[:, filter_index, :, :])
    else:
        loss = K.mean(layer_output[:, :, :, filter_index])

    # we compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, input_img)[0]

我检查了keras站点，但它只说这是形状为（1,224,224,3）的输入张量，但我仍然不明白那是什么。是来自ImageNet的图像，还是keras为keras模型提供的默认图像？

很抱歉，如果我对深度学习没有足够的了解，但是有人可以向我解释一下。谢谢

最佳答案

(1,224,224,3)的4个维度分别是batch_size，image_width，image_height和image_channels。 (1,224,224,3)表示VGG16模型接受形状为1和三个通道（RGB）的批量大小224x224（一次一个图像）。

有关batch以及batch size是什么的更多信息，可以检查this交叉验证的问题。

返回VGG16，体系结构的输入为(1, 224, 224, 3)。这是什么意思？为了将图像输入网络，您将需要：

对其进行预处理以达到（224、224）的形状和3个通道（RGB）的形状
将其转换为形状的实际矩阵（224、224、3）
以需要网络的大小将一批图像分组在一起（在这种情况下，批次大小为1，但是您需要向矩阵添加一个维度，以获得（1,2,224,224,3）

完成此操作后，您可以将图像输入模型。

Keras提供很少的实用功能来完成这些任务。下面，我在文档中提供了从Usage examples for image classification models使用VGG16提取功能中显示的代码段的修改版本。

为了使其真正起作用，您需要一个名为jpg的任意大小的elephant.jpg。您可以通过运行以下bash命令获得它：

wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg

为了清楚起见，我将在图像预处理和模型预测中拆分代码：

载入图片

import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

您可以沿途添加打印件以查看发生了什么，但是这里有一个简短的摘要：

image.load_img()加载已经为RGB的PIL图像并将其重塑为（224，224）
image.img_to_array()将此图像转换为形状矩阵（224、224、3）。如果访问x [0，0，0]，您将获得第一个像素的红色分量，其范围为0到255之间的数字
np.expand_dims(x, axis=0)正在添加第一维。 x之后的形状为(1, 224, 224, 3)
preprocess_input正在进行图像网络训练的体系结构所需的额外预处理。从其文档字符串（运行help(preprocess_input)）中，您可以看到它：

将图像从RGB转换为BGR，然后将每个颜色通道相对于ImageNet数据集零中心，而无需缩放

这似乎是ImageNet训练集的标准输入。

就是这样，现在，您只需将图像输入到经过预先训练的模型中即可获得预测

预测

y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)

y_hat包含分配给该图像的模型的1000个imagenet类的每一个的概率。

为了获得类名和可读的输出，keras还提供了一个实用程序函数：

from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)

输出，对于我之前下载的Zoorashia_elephant.jpg图像：

[[('n02504013', 'Indian_elephant', 0.48041093),
  ('n02504458', 'African_elephant', 0.47474155),
  ('n01871265', 'tusker', 0.03912963),
  ('n02437312', 'Arabian_camel', 0.0038948185),
  ('n01704323', 'triceratops', 0.00062475674)]]

看起来还不错！

关于python - Keras:什么是VGG16中的model.inputs，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/53395427/