本文介绍了我可以在keras图层之间共享权重,但其他参数是否有所不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在keras中,是否可以在两层之间共享权重,但使其他参数不同?考虑下面的(当然是人为的)示例:

In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:

conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')

请注意,除了padding以外,其他所有层都是相同的.我可以让keras对两者使用相同的权重吗? (即也要相应地训练网络吗?)

Notice that the layers are identical except for the padding. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)

我查看了keras文档,并在部分中共享层似乎暗示共享仅在层完全相同时才起作用.

I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.

推荐答案

据我所知,这不能通过Keras使用的常见"API级别"来完成.但是,如果您进行更深入的研究,可以使用一些(丑陋的)方式来共享权重.

To my knowledge, this cannot be done by the common "API level" of Keras usage.However, if you dig a bit deeper, there are some (ugly) ways to share the weights.

首先,通过调用add_weight()build()函数内部创建Conv2D层的权重:

First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():

    self.kernel = self.add_weight(shape=kernel_shape,
                                  initializer=self.kernel_initializer,
                                  name='kernel',
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)

对于您提供的用法(即默认trainable/constraint/regularizer/initializer),add_weight()除了将权重变量附加到_trainable_weights之外,没有什么特别的事情:

For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:

    weight = K.variable(initializer(shape), dtype=dtype, name=name)
    ...
        self._trainable_weights.append(weight)

最后,由于build()仅在__call__()内部调用(如果尚未构建图层),因此可以通过以下方式创建图层之间的共享权重:

Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:

  1. 调用conv1.build()初始化要共享的conv1.kernelconv1.bias变量.
  2. 调用conv2.build()初始化图层.
  3. conv1.kernelconv1.bias替换conv2.kernelconv2.bias.
  4. conv2._trainable_weights中删除conv2.kernelconv2.bias.
  5. conv1.kernelconv1.bias附加到conv2._trainable_weights.
  6. 完成模型定义.在这里conv2.__call__()将被称为;但是,由于已经构建了conv2,因此权重将不会重新初始化.
  1. Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
  2. Call conv2.build() to initialize the layer.
  3. Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
  4. Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
  5. Append conv1.kernel and conv1.bias to conv2._trainable_weights.
  6. Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.

以下代码段可能会有所帮助:

The following code snippet may be helpful:

def create_shared_weights(conv1, conv2, input_shape):
    with K.name_scope(conv1.name):
        conv1.build(input_shape)
    with K.name_scope(conv2.name):
        conv2.build(input_shape)
    conv2.kernel = conv1.kernel
    conv2.bias = conv1.bias
    conv2._trainable_weights = []
    conv2._trainable_weights.append(conv2.kernel)
    conv2._trainable_weights.append(conv2.bias)

# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights)  # True

# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')

X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())])  # [True, True]

这种笨拙的重量共享的缺点是,在保存/加载模型后,这些重量将不会保持共享状态.这不会影响预测,但是如果您要加载经过训练的模型以进行进一步的微调,则可能会出现问题.

One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.

这篇关于我可以在keras图层之间共享权重,但其他参数是否有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 16:24