问题描述
在keras中,是否可以在两层之间共享权重,但使其他参数不同?考虑下面的(当然是人为的)示例:
In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:
conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')
请注意,除了padding
以外,其他所有层都是相同的.我可以让keras对两者使用相同的权重吗? (即也要相应地训练网络吗?)
Notice that the layers are identical except for the padding
. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)
我查看了keras文档,并在部分中共享层似乎暗示共享仅在层完全相同时才起作用.
I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.
推荐答案
据我所知,这不能通过Keras使用的常见"API级别"来完成.但是,如果您进行更深入的研究,可以使用一些(丑陋的)方式来共享权重.
To my knowledge, this cannot be done by the common "API level" of Keras usage.However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
首先,通过调用add_weight()
在build()
函数内部创建Conv2D
层的权重:
First of all, the weights of the Conv2D
layers are created inside the build()
function, by calling add_weight()
:
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
对于您提供的用法(即默认trainable
/constraint
/regularizer
/initializer
),add_weight()
除了将权重变量附加到_trainable_weights
之外,没有什么特别的事情:
For your provided usage (i.e., default trainable
/constraint
/regularizer
/initializer
), add_weight()
does nothing special but appending the weight variables to _trainable_weights
:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
最后,由于build()
仅在__call__()
内部调用(如果尚未构建图层),因此可以通过以下方式创建图层之间的共享权重:
Finally, since build()
is only called inside __call__()
if the layer hasn't been built, shared weights between layers can be created by:
- 调用
conv1.build()
初始化要共享的conv1.kernel
和conv1.bias
变量. - 调用
conv2.build()
初始化图层. - 用
conv1.kernel
和conv1.bias
替换conv2.kernel
和conv2.bias
. - 从
conv2._trainable_weights
中删除conv2.kernel
和conv2.bias
. - 将
conv1.kernel
和conv1.bias
附加到conv2._trainable_weights
. - 完成模型定义.在这里
conv2.__call__()
将被称为;但是,由于已经构建了conv2
,因此权重将不会重新初始化.
- Call
conv1.build()
to initialize theconv1.kernel
andconv1.bias
variables to be shared. - Call
conv2.build()
to initialize the layer. - Replace
conv2.kernel
andconv2.bias
byconv1.kernel
andconv1.bias
. - Remove
conv2.kernel
andconv2.bias
fromconv2._trainable_weights
. - Append
conv1.kernel
andconv1.bias
toconv2._trainable_weights
. - Finish model definition. Here
conv2.__call__()
will be called; however, sinceconv2
has already been built, the weights are not going to be re-initialized.
以下代码段可能会有所帮助:
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
这种笨拙的重量共享的缺点是,在保存/加载模型后,这些重量将不会保持共享状态.这不会影响预测,但是如果您要加载经过训练的模型以进行进一步的微调,则可能会出现问题.
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.
这篇关于我可以在keras图层之间共享权重,但其他参数是否有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!