Stergiou Alexandros, Poppe Ronald
IEEE Trans Image Process. 2023;32:251-266. doi: 10.1109/TIP.2022.3227503. Epub 2022 Dec 21.
Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. Meeting both these requirements remains a challenge. To this end, we propose an adaptive and exponentially weighted pooling method: adaPool. Our method learns a regional-specific fusion of two sets of pooling kernels that are based on the exponent of the Dice-Sørensen coefficient and the exponential maximum, respectively. AdaPool improves the preservation of detail on a range of tasks including image and video classification and object detection. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, the learned weights can also be used to upsample activation maps. We term this method adaUnPool. We evaluate adaUnPool on image and video super-resolution and frame interpolation. For benchmarking, we introduce Inter4K, a novel high-quality, high frame-rate video dataset. Our experiments demonstrate that adaPool systematically achieves better results across tasks and backbones, while introducing a minor additional computational and memory overhead.
池化层是卷积神经网络(CNN)的基本构建模块,用于减少计算开销并增加后续卷积操作的感受野。其目标是生成下采样后的体数据,使其与输入体数据非常相似,同时,理想情况下,在计算和内存方面也要高效。满足这两个要求仍然是一个挑战。为此,我们提出了一种自适应指数加权池化方法:adaPool。我们的方法学习两组基于Dice-Sørensen系数指数和指数最大值的池化核的区域特定融合。AdaPool在包括图像和视频分类以及目标检测在内的一系列任务中提高了细节保留能力。adaPool的一个关键特性是其双向性。与常见的池化方法不同,学习到的权重还可用于对激活映射进行上采样。我们将此方法称为adaUnPool。我们在图像和视频超分辨率以及帧插值上评估adaUnPool。为了进行基准测试,我们引入了Inter4K,一个新颖的高质量、高帧率视频数据集。我们的实验表明,adaPool在各种任务和骨干网络中系统地取得了更好的结果,同时引入了少量额外的计算和内存开销。