Xu Gangwei, Wang Yun, Cheng Junda, Tang Jinhui, Yang Xin
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2461-2474. doi: 10.1109/TPAMI.2023.3335480. Epub 2024 Mar 6.
Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this article, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. The ACV can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy. We further design a fast version of ACV to enable real-time performance, named Fast-ACV, which generates high likelihood disparity hypotheses and the corresponding attention weights from low-resolution correlation clues to significantly reduce computational and memory cost and meanwhile maintain a satisfactory accuracy. Furthermore, we design a highly accurate network ACVNet and a real-time network Fast-ACVNet based on our ACV and Fast-ACV respectively, which achieve state-of-the-art performance on several benchmarks.
立体匹配是许多视觉和机器人应用的基本构建模块。一个信息丰富且简洁的代价体表示对于高精度和高效率的立体匹配至关重要。在本文中,我们提出了一种新颖的代价体构建方法,称为注意力拼接体(ACV),它从相关线索中生成注意力权重,以抑制冗余信息并增强拼接体中与匹配相关的信息。ACV可以无缝嵌入到大多数立体匹配网络中,由此产生的网络可以使用更轻量级的聚合网络,同时实现更高的精度。我们进一步设计了ACV的快速版本以实现实时性能,称为Fast-ACV,它从低分辨率相关线索中生成高似然性视差假设和相应的注意力权重,以显著降低计算和内存成本,同时保持令人满意的精度。此外,我们分别基于ACV和Fast-ACV设计了一个高精度网络ACVNet和一个实时网络Fast-ACVNet,它们在多个基准测试中实现了领先的性能。