Caffe多输入图像

本文介绍了Caffe多输入图像的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在考虑实现一个接受两个输入图像和一个标签（后来可能是其他数据）的Caffe CNN，并且想知道是否有人知道prototxt文件中的正确语法是为了做到这一点？它只是一个带有额外上衣的IMAGE_DATA图层吗？或者我应该为每个使用单独的IMAGE_DATA图层吗？

I'm looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the prototxt file for doing this? Is it simply an IMAGE_DATA layer with additional tops? Or should I use separate IMAGE_DATA layers for each?

谢谢，
James

Thanks,James

推荐答案

编辑：我最近一直在使用HDF5_DATA层，这绝对是可行的方法。

I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.

HDF5是一个键值存储，其中每个键都是一个字符串，每个值都是一个多维数组。因此，要使用HDF5_DATA图层，只需为要使用的每个顶部添加新密钥，并设置该密钥的值以存储要使用的图像。从python编写这些HDF5文件很简单：

HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:

import h5py
import numpy as np

filelist = []
for i in range(100):
    image1 = get_some_image(i)
    image2 = get_another_image(i)
    filename = '/tmp/my_hdf5%d.h5' % i
    with hypy.File(filename, 'w') as f:
        f['data1'] = np.transpose(image1, (2, 0, 1))
        f['data2'] = np.transpose(image2, (2, 0, 1))
    filelist.append(filename)
with open('/tmp/filelist.txt', 'w') as f:
    for filename in filelist:
        f.write(filename + '\n')

然后只需将HDF5_DATA参数的来源设置为'/tmp/filelist.txt'，并将tops设置为data1和data2。

Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".

我将在下面留下原始回复：

I'm leaving the original response below:

=================== =================================

====================================================

有这样做的两个好方法。最简单的可能是使用两个单独的IMAGE_DATA图层，一个是第一个图像和标签，另一个是第二个图像。 Caffe从LMDB或LEVELDB中检索图像，这些图像是关键值存储，并假设您使用具有相同整数id键的相应图像创建两个数据库，Caffe实际上将正确加载图像，并且您可以继续使用两个层的数据/标签。

There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.

这种方法的问题在于，拥有两个数据层并不是非常令人满意，如果你想要的话它也不能很好地扩展做一些更高级的事情，比如在边界框之类的东西上使用非整数标签等。如果你准备花时间投资，你可以通过修改工具/ convert_imageset.cpp文件来堆叠图像做得更好或跨渠道的其他数据。例如，您可以创建一个包含6个通道的基准 - 第一个用于第一个图像的RGB，第二个用于第二个图像的RGB。在使用IMAGE_DATA图层读取此内容后，您可以使用SLICE图层将流分割为两个图像，其中slice_point位于slice_dim = 1维度的索引3处。如果你想继续下去，你决定加载更复杂的数据分类，你就会理解编码方案，并且可以根据src / caffe / layers / data_layer.cpp编写你自己的解码层来获得完全的控制权。管道。

The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.

这篇关于Caffe多输入图像的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！