问题描述
我想构建一个自动编码器,其中编码器中的每一层与解码器中的对应层具有相同的含义.因此,如果对自动编码器进行了完美的训练,则这些层的值应该大致相同.
I want to build an autoencoder where each layer in the encoder has the same meaning as a correspondent layer in the decoder. So if the autoencoder is perfectly trained, the values of those layers should be roughly the same.
因此,可以说自动编码器由e1-> e2-> e3-> d2-> d1组成,而e1是输入,d1是输出.普通的自动编码器训练在d1中具有与e1相同的结果,但是我想要附加的约束,即e2和d2相同.因此,我想要一条从d2到e2的附加反向传播路径,并与从d1到e1的正常路径同时训练. (d代表解码器,e代表编码器).
So lets say the autoencoder consists of e1 -> e2 -> e3 -> d2 -> d1, whereas e1 is the input and d1 the output. A normal autoencoder trains to have the same result in d1 as e1, but I want the additional constraint, that e2 and d2 are the same. Therefore I want an additional backpropagation path which leads from d2 to e2 and trains at the same time as the normal path from d1 to e1. (d stands for decoder, e for encoder).
从该链接的第一个答案开始,我尝试将e2和d2之间的错误用作CustomRegularization层的正则化项 https://github.com/keras-team/keras/issues/5563 .我还将它用于e1和d1之间的错误,而不是正常路径.
I tried to use the error between e2 and d2 as a regularization term with the CustomRegularization layer from the first answer of this link https://github.com/keras-team/keras/issues/5563. I also use this for the error between e1 and d1 instead of the normal path.
编写以下代码,以便可以处理多个中间层,并且还使用4个层.在注释掉的代码中,是一个普通的自动编码器,它只会从头到尾传播.
The following code is written such that more than 1 intermediate layer can be handled and also uses 4 layers.In the out commented code is a normal autoencoder which only propagates from start to end.
from keras.layers import Dense
import numpy as np
from keras.datasets import mnist
from keras.models import Model
from keras.engine.topology import Layer
from keras import objectives
from keras.layers import Input
import keras
import matplotlib.pyplot as plt
#A layer which can be given as an output to force a regularization term between two layers
class CustomRegularization(Layer):
def __init__(self, **kwargs):
super(CustomRegularization, self).__init__(**kwargs)
def call(self, x, mask=None):
ld=x[0]
rd=x[1]
bce = objectives.binary_crossentropy(ld, rd)
loss2 = keras.backend.sum(bce)
self.add_loss(loss2, x)
return bce
def get_output_shape_for(self, input_shape):
return (input_shape[0][0],1)
def zero_loss(y_true, y_pred):
return keras.backend.zeros_like(y_pred)
#Create regularization layer between two corresponding layers of encoder and decoder
def buildUpDownRegularization(layerNo, input, up_layers, down_layers):
for i in range(0, layerNo):
input = up_layers[i](input)
start = input
for i in range(layerNo, len(up_layers)):
input = up_layers[i](input)
for j in range(0, len(down_layers) - layerNo):
input = down_layers[j](input)
end = input
cr = CustomRegularization()([start, end])
return cr
# Define shape of the network, layers, some hyperparameters and training data
sizes = [784, 400, 200, 100, 50]
up_layers = []
down_layers = []
for i in range(1, len(sizes)):
layer = Dense(units=sizes[i], activation='sigmoid', input_dim=sizes[i-1])
up_layers.append(layer)
for i in range(len(sizes)-2, -1, -1):
layer = Dense(units=sizes[i], activation='sigmoid', input_dim=sizes[i+1])
down_layers.append(layer)
batch_size = 128
num_classes = 10
epochs = 100
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
x_train = x_train.reshape([x_train.shape[0], 28*28])
x_test = x_test.reshape([x_test.shape[0], 28*28])
y_train = x_train
y_test = x_test
optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
"""
### Normal autoencoder like in base mnist example
model = keras.models.Sequential()
for layer in up_layers:
model.add(layer)
for layer in down_layers:
model.add(layer)
model.compile(optimizer=optimizer, loss=keras.backend.binary_crossentropy)
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)
score = model.evaluate(x_test, y_test, verbose=0)
#print('Test loss:', score[0])
#print('Test accuracy:', score[1])
decoded_imgs = model.predict(x_test)
n = 10 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
"""
### My autoencoder where each subpart is also an autoencoder
#This part is only because the model needs a path from start to end, contentwise this should do nothing
output = input = Input(shape=(sizes[0],))
for i in range(0, len(up_layers)):
output = up_layers[i](output)
for i in range(0, len(down_layers)):
output = down_layers[i](output)
crs = [output]
losses = [zero_loss]
#Build the regularization layer
for i in range(len(up_layers)):
crs.append(buildUpDownRegularization(i, input, up_layers, down_layers))
losses.append(zero_loss)
#Create and train model with adapted training data
network = Model([input], crs)
optimizer = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
network.compile(loss=losses, optimizer=optimizer)
dummy_train = np.zeros([y_train.shape[0], 1])
dummy_test = np.zeros([y_test.shape[0], 1])
training_data = [y_train]
test_data = [y_test]
for i in range(len(network.outputs)-1):
training_data.append(dummy_train)
test_data.append(dummy_test)
network.fit(x_train, training_data, batch_size=batch_size, epochs=epochs,verbose=1, validation_data=(x_test, test_data))
score = network.evaluate(x_test, test_data, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
decoded_imgs = network.predict(x_test)
n = 10 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[0][i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
如果按原样运行代码,则表明在我的代码中不再提供复制能力.我希望与未注释的代码具有相似的行为,该代码显示了正常的自动编码器.
If you run the code as is it will show, that the reproduction ability is no longer given in my code.I expect a similar behavior to the uncommented code, which shows a normal autoencoder.
如答案中所述,此方法适用于MSE而不是交叉熵,且lr为.01.使用该设置的100个时代产生了非常好的结果.
As mentioned in the answers this works well with MSE instead of crossentropy and a lr of .01. 100 epochs with that setting produce really good results.
我希望反向传播按此[image]中的方式工作( https://imgur.com/OOo757x ).因此,某个层的损耗的反向传播会在相应的层停止.我想我之前并没有明确指出这一点,也不知道代码当前是否可以做到这一点.
Edit 2: I would like that the backpropagation works as in this [image] (https://imgur.com/OOo757x). So the backpropagation of the loss of a certain layer stops at the corresponding layer. I think I didn't make this clear before and I don't know if the code currently does that.
尽管此代码运行并返回了一个美观的解决方案,但CustomRegularization层并未按照我认为的方式进行操作,因此它的功能与描述中的操作不同.
Edit 3: Although this code runs and returns a good looking solution the CustomRegularization layer is not doing what I thought it would do, therefore it does not do the same things as in the description.
推荐答案
似乎主要的问题是使用二进制互熵来最小化编码器和解码器之间的差异.网络中的内部表示不会像在对MNIST数字进行分类时的输出那样是单一类别的概率.通过这些简单的更改,我就能使您的网络输出一些看上去合理的重构:
It seems like the main issue is the use of binary cross-entropy to minimize the difference between encoder and decoder. The internal representation in the network is not going to be a single class probability like the output might be if you were classifying MNIST digits. I was able to get your network to output some reasonable-looking reconstructions with these simple changes:
-
使用
objectives.mean_squared_error
代替CustomRegularization
类中的objectives.binary_crossentropy
将纪元数更改为5
将学习率更改为.01
Changing learning rate to .01
更改2和3只是为了加快测试速度.变更1是这里的关键.交叉熵是为存在二进制地面真实"变量和该变量的估计值的问题而设计的.但是,您在网络中间没有二进制真实值,仅在输出层没有.因此,在网络中间的交叉熵损失函数没有多大意义(至少对我而言),它将试图测量非二进制变量的熵.另一方面,均方误差更为通用,应该适用于这种情况,因为您只是在最小化两个实数值之间的差异.从本质上讲,网络的中间部分是执行回归(两个连续值(即层)在激活之间的差异)而不是分类,因此它需要适合回归的损失函数.
Changes 2 and 3 were simply made to speed up the testing. Change 1 is the key here. Cross entropy is designed for problems where there is a binary "ground truth" variable and an estimate of that variable. However, you do not have a binary truth value in the middle of your network, only at the output layer. Thus a cross entropy loss function in the middle of the network doesn't make much sense (at least to me) -- it will be trying to measure entropy for a variable that isn't binary. Mean squared error, on the other hand, is a bit more generic and should work for this case since you are simply minimizing the difference between two real values. In essence, the middle of the network is performing regression (difference between activations in two continuous values, i.e. layers), not classification, so it needs a loss function that is appropriate for regression.
我还想建议可能有一种更好的方法来完成您想要的.如果您确实希望编码器和解码器完全相同,则可以在它们之间共享权重.然后它们将是相同的,而不仅仅是高度相似,并且您的模型将需要训练的参数更少.在Keras 此处如果您好奇的话.
I also want to suggest that there may be a better approach to accomplish what you want. If you really want the encoder and decoder to be exactly the same, you can share weights between them. Then they will be identical, not just highly similar, and your model will have fewer parameters to train. There is a decent explanation of shared (tied) weights autoencoders with Keras here if you're curious.
阅读代码似乎确实可以完成您想在插图中进行的操作,但是我不确定如何验证这一点.
Reading your code it does seem like it is doing what you want in your illustration, but I am not really sure how to verify that.
这篇关于如何创建自动编码器,其中编码器的每一层都应表示与解码器的层相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!