Gao Hongyang, Yuan Hao, Wang Zhengyang, Ji Shuiwang
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1218-1227. doi: 10.1109/TPAMI.2019.2893965. Epub 2019 Jan 18.
Transposed convolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of transposed convolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel transposed convolutional layer (PixelTCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular transposed convolutional operation. The resulting PixelTCL can be used to replace any transposed convolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelTCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelTCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than transposed convolutional layers. When used in image generation tasks, our PixelTCL can largely overcome the checkerboard problem suffered by regular transposed convolutional operations.
转置卷积层已广泛应用于各种用于上采样的深度模型中,包括用于语义分割的编码器 - 解码器网络和用于无监督学习的深度生成模型。转置卷积操作的一个关键限制是它们会导致所谓的棋盘问题。这是由于输出特征图上相邻像素之间不存在直接关系这一事实造成的。为了解决这个问题,我们提出了像素转置卷积层(PixelTCL),以在上采样特征图上的相邻像素之间建立直接关系。我们的方法基于对常规转置卷积操作的全新解释。由此产生的PixelTCL可以以即插即用的方式用于替换任何转置卷积层,而不会损害原始模型的完全可训练能力。所提出的PixelTCL可能会导致效率略有下降,但这可以通过一种实现技巧来克服。语义分割的实验结果表明,PixelTCL可以考虑诸如边缘和形状等空间特征,并且比转置卷积层产生更准确的分割输出。当用于图像生成任务时,我们的PixelTCL可以很大程度上克服常规转置卷积操作所遭受的棋盘问题。