Salehi Ali, Balasubramanian Madhusudhanan
Department of Electrical and Computer Engineering, The University of Memphis, Memphis TN 38152.
Neurocomputing (Amst). 2023 Feb 28;523:116-129. doi: 10.1016/j.neucom.2022.12.024. Epub 2022 Dec 15.
Dense pixel matching problems such as optical flow and disparity estimation are among the most challenging tasks in computer vision. Recently, several deep learning methods designed for these problems have been successful. A sufficiently larger effective receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. In this work, we present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution. To achieve a larger ERF, we utilized dilated convolutional layers. By aggressively increasing dilation rates in the deeper layers, we were able to achieve a sufficiently larger ERF with a significantly fewer number of trainable parameters. We used optical flow estimation problem as the primary benchmark to illustrate our network design strategy. The benchmark results (Sintel, KITTI, and Middlebury) indicate that our compact networks can achieve comparable performance in the class of networks.
诸如光流和视差估计等密集像素匹配问题是计算机视觉中最具挑战性的任务之一。最近,为这些问题设计的几种深度学习方法已经取得了成功。网络中足够大的有效感受野(ERF)和更高分辨率的空间特征对于提供更高分辨率的密集估计至关重要。在这项工作中,我们提出了一种系统的方法来设计网络架构,该架构可以在保持较高空间特征分辨率的同时提供更大的感受野。为了实现更大的ERF,我们使用了空洞卷积层。通过在更深层中大幅提高扩张率,我们能够以显著更少的可训练参数实现足够大的ERF。我们以光流估计问题作为主要基准来说明我们的网络设计策略。基准测试结果(Sintel、KITTI和Middlebury)表明,我们的紧凑网络在同类网络中可以实现可比的性能。