本文介绍了膨胀卷积有什么用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我指的是




  • 图(a)是1扩散的3x3卷积滤波器。换句话说,它是标准的3x3卷积滤镜。

  • 图(b)是2扩散的3x3卷积滤镜。红点是权重所在的地方,其他任何地方都为0。换句话说,它是一个 5x5卷积过滤器,具有9个非零权重,其他所有地方为0 ,如问题中所述。在这种情况下,接收场为7x7,因为前一个输出中的每个单元的接收场为3x3。蓝色突出显示的部分显示接受域,表示卷积过滤器(如果愿意,您可以将其视为卷积过滤器,但这无济于事)。

  • 图(c)是一个4倍的3x3卷积滤波器。这是一个 9x9卷积过滤器,具有9个非零权重,其他所有地方均为0 。从(b)中可以得出,每个单位现在都有7x7的接收场,因此您可以在每个红点周围看到7x7的蓝色部分。



要进行明显的对比,请考虑以下问题:




  • 如果我们使用3个连续的3x3卷积滤镜层,步幅为1 ,则有效接收字段在结尾处仅为7x7。但是,在相同的计算和内存成本的情况下,我们可以通过膨胀卷积实现15x15的分辨率。这两个操作都保留分辨率。

  • 如果我们使用3个连续的3x3卷积滤波器层,它们的步幅以与本文中的卷积卷积完全相同的指数速率递增,则得到的结果为15x15末端的感受力场,但是随着覆盖范围的扩大,覆盖范围会逐渐消失。这种覆盖范围的丧失意味着在某些时候有效的接收场将不是我们在上面看到的。某些部分不会重叠。


I refer to Multi-Scale Context Aggregation by Dilated Convolutions.

  • A 2x2 kernel would have holes in it such that it becomes a 3x3 kernel.
  • A 3x3 kernel would have holes in it such that it becomes a 5x5 kernel.
  • Above assumes interval 1 of course.

I can clearly see that this allows you to effectively use 4 parameters but have a receptive field of 3x3 and 9 parameters but have a receptive field of 5x5.

Is the case of dilated convolution simply to save on parameters while reaping the benefit of a larger receptive field and thus save memory and computations?

解决方案

TLDR

  1. Dilated convolutions have generally improved performance (see the better semantic segmentation results in Multi-Scale Context Aggregation by Dilated Convolutions)
  2. The more important point is that the architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage.

  3. Allows one to have larger receptive field with same computation and memory costs while also preserving resolution.

  4. Pooling and Strided Convolutions are similar concepts but both reduce the resolution.

@Rahul referenced WaveNet, which put it very succinctly in 2.1 Dilated Causal Convolutions. It is also worth looking at Multi-Scale Context Aggregation by Dilated Convolutions I break it down further here:

  • Figure (a) is a 1-dilated 3x3 convolution filter. In other words, it's a standard 3x3 convolution filter.
  • Figure (b) is a 2-dilated 3x3 convolution filter. The red dots are where the weights are and everywhere else is 0. In other words, it's a 5x5 convolution filter with 9 non-zero weights and everywhere else 0, as mentioned in the question. The receptive field in this case is 7x7 because each unit in the previous output has a receptive field of 3x3. The highlighted portions in blue show the receptive field and NOT the convolution filter (you could see it as a convolution filter if you wanted to but it's not helpful).
  • Figure (c) is a 4-dilated 3x3 convolution filter. It's a 9x9 convolution filter with 9 non-zeros weights and everywhere else 0. From (b), we have it that each unit now has a 7x7 receptive field, and hence you can see a 7x7 blue portion around each red dot.

To draw an explicit contrast, consider this:

  • If we use 3 successive layers of 3x3 convolution filters with stride of 1, the effective receptive field will only be 7x7 at the end of it. However, with the same computation and memory costs, we can achieve 15x15 with dilated convolutions. Both operations preserve resolution.
  • If we use 3 successive layers of 3x3 convolution filters with increasing stride at an exponential rate at exactly the same rate as dilated convolutions in the paper, we will get a 15x15 receptive field at the end of it but with loss of coverage eventually as the stride gets larger. What this loss of coverage means is that the effective receptive field at some point will not be what we see above. Some parts will not be overlapping.

这篇关于膨胀卷积有什么用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 16:11