问题描述
我正在尝试将大型Caffe网络复制到Keras(基于tensorflow后端)。但是即使在单个卷积层上,我也遇到了很大的麻烦。
I'm trying to replicate a large Caffe network into Keras (based on tensorflow backend). But I'm having a large trouble doing it even at a single convolutional layer.
简单卷积一般:
假设我们有一个形状为(1,500,500,3)
的4D输入,并且我们必须对此输入带有 96
过滤器,内核大小为 11
和 4x4
大步前进。
Let's say we had a 4D input with shape (1, 500, 500, 3)
, and we had to perform a single convolution on this input with 96
filters with kernel size of 11
and 4x4
strides.
让我们设置权重和输入变量:
Let's set our weight and input variables:
w = np.random.rand(11, 11, 3, 96) # weights 1
b = np.random.rand(96) # weights 2 (bias)
x = np.random.rand(500, 500, 3)
在Keras中进行简单卷积:
在Keras中可以这样定义:
This is how it could be defined in Keras:
from keras.layers import Input
from keras.layers import Conv2D
import numpy as np
inp = Input(shape=(500, 500, 3))
conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=keras.activations.relu, padding='valid')(inp)
model = keras.Model(inputs=[inp], outputs=conv1)
model.layers[1].set_weights([w, b]) # set weights for convolutional layer
predicted = model.predict([x.reshape(1, 500, 500, 3)])
print(predicted.reshape(1, 96, 123, 123)) # reshape keras output in the form of Caffe
Caffe中简单卷积的形式:
simple.prototxt
:
name: "simple"
input: "inp"
input_shape {
dim: 1
dim: 3
dim: 500
dim: 500
}
layer {
name: "conv1"
type: "Convolution"
bottom: "inp"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
pad: 0
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
Python中的Caffe:
Caffe in Python:
import caffe
net = caffe.Net('simple.prototxt', caffe.TEST)
net.params['conv1'][0].data[...] = w.reshape(96, 3, 11, 11) # set weights 1
net.params['conv1'][1].data[...] = b # set weights 2 (bias)
net.blobs['inp'].reshape(1, 3, 500, 500) # reshape input layer to fit our input array x
print(net.forward(inp=x.reshape(1, 3, 500, 500)).get('conv1'))
问题:
如果我们同时执行了代码片段,我们将注意到输出彼此不同。我知道Caffe的对称填充等差异很小,但是我什至没有在这里使用填充。但是Caffe的输出不同于Keras的输出...
If we executed both of the snippets of code, we would notice that outputs are different from each other. I understand that there are few differences such as symmetric padding of Caffe, but I didn't even use padding here. Yet the output of Caffe is different from output of Keras...
为什么会这样?我知道Theano后端不像Caffe那样利用相关性,因此它需要将内核旋转180度,但是对于tensorflow是否相同?据我所知,Tensorflow和Caffe都使用互相关而不是卷积。
Why is this so? I know that Theano backend doesn't utilize correlation like Caffe does and hence it requires kernel to be rotated by 180 degrees, but is it the same for tensorflow? from what I know, both Tensorflow and Caffe use cross-correlation instead of Convolution.
我如何在Keras和Caffe中制作两个使用卷积的相同模型?
How could I make two identical models in Keras and Caffe that use convolution?
任何帮助将不胜感激!
推荐答案
我发现了问题,但是我不确定如何解决这个问题。
I found the problem, but I'm not sure how to fix it yet...
这两个卷积层之间的区别在于它们的项对齐。仅当过滤器的数量等于 N
使得 N>时,才会出现此 alignment 问题。 1&& & S
其中, S
是过滤器的尺寸。换句话说,仅当我们从卷积中得到行数和列数均大于1 的多维数组时,才会发生这种问题。
The difference between these two convolutional layers is alignment of their items. This alignment problem only occurs when number of filters are equal to N
such that N > 1 && N > S
where S
is dimension of filter. In other words, such problem only occurs when we get a multi-dimensional array from convolution which has both number of rows and number of columns greater than 1.
为了解这一点,我简化了输入和输出数据,以便我们可以更好地分析
To see this, I simplified my input and output data so that we can better analyze the mechanics of both layers.
simple.prototxt
:
input: "input"
input_shape {
dim: 1
dim: 1
dim: 2
dim: 2
}
layer {
name: "conv1"
type: "Convolution"
bottom: "input"
top: "conv1"
convolution_param {
num_output: 2
kernel_size: 1
pad: 0
stride: 1
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
simple.py
:
import keras
import caffe
import numpy as np
from keras.layers import Input, Conv2D
from keras.activations import relu
from keras import Model
filters = 2 # greater than 1 and ker_size
ker_size = 1
_input = np.arange(2 * 2).reshape(2, 2)
_weights = [np.reshape([[2 for _ in range(filters)] for _ in range(ker_size*ker_size)], (ker_size, ker_size, 1, filters)), np.reshape([0 for _ in range(filters)], (filters,))] # weights for Keras, main weight is array of 2`s while bias weight is array of 0's
_weights_caffe = [_weights[0].T, _weights[1].T] # just transpose them for Caffe
# Keras Setup
keras_input = Input(shape=(2, 2, 1), dtype='float32')
keras_conv = Conv2D(filters=filters, kernel_size=ker_size, strides=(1, 1), activation=relu, padding='valid')(keras_input)
model = Model(inputs=[keras_input], outputs=keras_conv)
model.layers[1].set_weights([_weights[0], _weights[1]])
# Caffe Setup
net = caffe.Net("simpler.prototxt", caffe.TEST)
net.params['conv1'][0].data[...] = _weights_caffe[0]
net.params['conv1'][1].data[...] = _weights_caffe[1]
net.blobs['input'].data[...] = _input.reshape(1, 1, 2, 2)
# Predictions
print("Input:\n---")
print(_input)
print(_input.shape)
print("\n")
print("Caffe:\n---")
print(net.forward()['conv1'])
print(net.forward()['conv1'].shape)
print("\n")
print("Keras:\n---")
print(model.predict([_input.reshape(1, 2, 2, 1)]))
print(model.predict([_input.reshape(1, 2, 2, 1)]).shape)
print("\n")
输出:
Input:
---
[[0 1]
[2 3]]
(2, 2)
Caffe:
---
[[[[0. 2.]
[4. 6.]]
[[0. 2.]
[4. 6.]]]]
(1, 2, 2, 2)
Keras:
---
[[[[0. 0.]
[2. 2.]]
[[4. 4.]
[6. 6.]]]]
(1, 2, 2, 2)
分析:
如果您查看Caffe模型的输出,您会注意到我们的 2x2 $ c $首先将c>数组加倍(这样我们就有2个
2x2
数组),然后使用权重矩阵对这两个数组中的每一个执行矩阵乘法。像这样的东西:
If you look at output by the Caffe model, you'll notice that our 2x2
array is first doubled (so that we have an array of 2 2x2
arrays) and then matrix multiplication is performed on each of those two arrays with our weight matrix. Something like this:
原始:
[[[[0. 2.]
[4. 6.]]
[[0. 2.]
[4. 6.]]]]
已转换:
[[[[(0 * 2) (2 * 2)]
[(4 * 2) (6 * 2)]]
[[(0 * 2) (2 * 2)]
[(4 * 2) (6 * 2)]]]]
Tensorflow有所不同,它似乎在与Caffe进行相同的操作后首先按升序对齐输出的2D向量做到了。这似乎是一种怪异的行为,我无法理解他们为什么要这样做。
Tensorflow does something different, it seems to first align 2D vectors of output in ascending order after doing the same thing as Caffe did. This seems like a weird behavior, and I'm unable to understand why would they do such thing.
我已经回答了有关问题的原因的自己的问题,但是我还没有任何干净的解决方案。我仍然没有找到令人满意的答案,因此我将接受具有实际解决方案的问题。
I have answered my own question about the cause of the problem, but I'm not aware of any clean solution yet. I still don't find my answer satisfying enough hence I'm going to accept the question which has the actual solution.
我唯一知道的解决方案是创建自定义层,这对我来说不是一个很整洁的解决方案。
The only solution I know is creation of custom layer, which is not a very neat solution to me.
这篇关于Keras与Caffe的卷积有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!