

我是使用Python 3.7.7和Tensorflow 2.1.0的新手,并且试图了解。我已经尝试过此代码:

I'm newbie with Python 3.7.7 and Tensorflow 2.1.0 and I'm trying to understand Conv2DTranspose. I have tried this code:

def vgg16_decoder(input_size = (7, 7, 512)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs)

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_decoder')

    opt = Adam(lr=0.001)
    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model


Model: "vgg-16_decoder"
Layer (type)                 Output Shape              Param #
input (InputLayer)           (None, 7, 7, 512)         0
conv1 (Conv2DTranspose)      (None, 9, 9, 512)         1049088
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0

但是我想要从 conv1 输出(无,14、14、512)


I have changed filter size to (3, 3) and I get this summary:

Model: "vgg-16_decoder"
Layer (type)                 Output Shape              Param #
input (InputLayer)           (None, 7, 7, 512)         0
conv1 (Conv2DTranspose)      (None, 11, 11, 512)       2359808
Total params: 2,359,808
Trainable params: 2,359,808
Non-trainable params: 0

我正尝试使用 Conv2DTranspose 做到这一点:

I'm trying to get to this using Conv2DTranspose:

# A piece of code from U-NET implementation

up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal', name = 'up6')(UpSampling2D(size = (2,2), name = 'upsp1')(drop5))


drop5 (Dropout)                 (None, 16, 16, 1024) 0           conv5_2[0][0]
upsp1 (UpSampling2D)            (None, 32, 32, 1024) 0           drop5[0][0]
up6 (Conv2D)                    (None, 32, 32, 512)  2097664     upsp1[0][0]


It upsamples by 2 its input and it changes its number of filters.



用Conv2DTranspose可以做到吗? ,或者我想我做了,但是我不明白自己做了什么:

I think, or I suppose, I did it, but I don't understand what I did:

conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)


Model: "vgg-16_decoder"
Layer (type)                 Output Shape              Param #
input (InputLayer)           (None, 7, 7, 512)         0
conv1 (Conv2DTranspose)      (None, 14, 14, 512)       1049088
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0


If you want to correct me or explain what I have done here, you are welcome.



By the way, I'm trying to create an VGG-16 decoder. This is the code for my VGG-16 encoder:

def vgg16_encoder(input_size = (224,224,3)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(inputs)
    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_1')(conv1)

    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(pool1)
    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_2')(conv2)

    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(pool2)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(conv3)
    pool3 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_3')(conv3)

    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(pool3)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(conv4)
    pool4 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(conv5)
    pool5 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_5')(conv5)

    opt = Adam(lr=0.001)

    model = Model(inputs = inputs, outputs = pool5, name = 'vgg-16_encoder')

    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model



When we design encoder-decoder architecture we need some operation that reverses the operations already done. So, let's say in encoder we have Conv2D, and Pooling (common in architectures like VGG). We use Conv2dTranspose (this can be thought of reverse operation of Conv2D), and Upsampling2D (reverse operation of Pooling (well, not rigorously [pooling is an irreversible operation as information is lost])).


from tensorflow.keras.layers import *
from tensorflow.keras.models import *

def encoder_decoder_conv(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    conv = Conv2D(512, (3,3))(ip) # look here, the default padding is used
    # decoder
    inv_conv = Conv2DTranspose(3, (3,3))(conv)
    # simple model
    model = Model(ip, inv_conv)
    return model

model1 = encoder_decoder_conv()

def encoder_decoder_pooling(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    pool = MaxPool2D((2,2))(ip) # look here, the default padding is used
    # decoder
    inv_pool = UpSampling2D((2,2))(pool)
    # simple model
    model = Model(ip, inv_pool)
    return model

model2 = encoder_decoder_pooling()

Model: "model_1"
Layer (type)                 Output Shape              Param #
input_2 (InputLayer)         [(None, 224, 224, 3)]     0
conv2d_1 (Conv2D)            (None, 222, 222, 512)     14336
conv2d_transpose_1 (Conv2DTr (None, 224, 224, 3)       13827
Total params: 28,163
Trainable params: 28,163
Non-trainable params: 0
Model: "model_2"
Layer (type)                 Output Shape              Param #
input_3 (InputLayer)         [(None, 224, 224, 3)]     0
max_pooling2d (MaxPooling2D) (None, 112, 112, 3)       0
up_sampling2d (UpSampling2D) (None, 224, 224, 3)       0
Total params: 0
Trainable params: 0
Non-trainable params: 0

如您所见,在第一个模型中,使用Conv2DTranspose反转操作以获得与输入完全相同的形状(224,224,3 )。

As, you can see in the first model, with Conv2DTranspose we reverse the operations to get exactly the same shape as input (224,224,3).


For model2, we reverse the operation of Pooling (in terms of feature map shape) with Upsampling.


So, as you're trying to make a VGG-decoder, and VGG mostly consists of Conv2D and Maxpooling2D, all you have to do reverse those operations using Conv2dTranspose and Upsampling so you get the exact input shape (224, 224, 3) from the feature map shape (7, 7, 512).


Finally, there are some variations of the decoder part, but I think you're looking for this VGG-16 decoder.

def vgg16_decoder(input_size = (7,7,512)):
    inputs = Input(input_size, name = 'input')

    pool5 = UpSampling2D((2,2), name = 'pool_5')(inputs)
    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(pool5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(conv5)

    pool4 = UpSampling2D((2,2), name = 'pool_4')(conv5)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(pool4)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(conv4)
    pool3 = UpSampling2D((2,2), name = 'pool_3')(conv4)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(pool3)
    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(conv3)

    pool2 = UpSampling2D((2,2), name = 'pool_2')(conv3)
    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(pool2)

    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(conv2)

    pool1 = UpSampling2D((2,2), name = 'pool_1')(conv2)

    conv1 = Conv2DTranspose(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(pool1)

    conv1 = Conv2DTranspose(3, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(conv1) # to get 3 channels

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_encoder')

    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    return model

model = vgg16_decoder()

Model: "vgg-16_encoder"
Layer (type)                 Output Shape              Param #
input (InputLayer)           [(None, 7, 7, 512)]       0
pool_5 (UpSampling2D)        (None, 14, 14, 512)       0
conv5_3 (Conv2DTranspose)    (None, 14, 14, 512)       2359808
conv5_2 (Conv2DTranspose)    (None, 14, 14, 512)       2359808
conv5_1 (Conv2DTranspose)    (None, 14, 14, 512)       2359808
pool_4 (UpSampling2D)        (None, 28, 28, 512)       0
conv4_3 (Conv2DTranspose)    (None, 28, 28, 512)       2359808
conv4_2 (Conv2DTranspose)    (None, 28, 28, 512)       2359808
conv4_1 (Conv2DTranspose)    (None, 28, 28, 512)       2359808
pool_3 (UpSampling2D)        (None, 56, 56, 512)       0
conv3_3 (Conv2DTranspose)    (None, 56, 56, 256)       1179904
conv3_2 (Conv2DTranspose)    (None, 56, 56, 256)       590080
conv3_1 (Conv2DTranspose)    (None, 56, 56, 256)       590080
pool_2 (UpSampling2D)        (None, 112, 112, 256)     0
conv2_2 (Conv2DTranspose)    (None, 112, 112, 128)     295040
conv2_1 (Conv2DTranspose)    (None, 112, 112, 128)     147584
pool_1 (UpSampling2D)        (None, 224, 224, 128)     0
conv1_2 (Conv2DTranspose)    (None, 224, 224, 64)      73792
conv1_1 (Conv2DTranspose)    (None, 224, 224, 3)       1731
Total params: 17,037,059
Trainable params: 17,037,059
Non-trainable params: 0

花费(7,7,512) 特征形状并重建原始图像尺寸(224,224,3)

It takes (7, 7, 512) feature shape and reconstructs the original image dimension (224, 224, 3).


In summary, the mechanical way of designing a decoder would be going in the opposite direction (relative to the encoder) while doing reverse operations. As for details of Conv2DTranspose and Upsampling2D, if you want to really understand these concepts in more depth:


07-12 02:24