Suppr超能文献

重新思考RGB-D显著目标检测:模型、数据集和大规模基准

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.

作者信息

Fan Deng-Ping, Lin Zheng, Zhang Zhao, Zhu Menglong, Cheng Ming-Ming

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2075-2089. doi: 10.1109/TNNLS.2020.2996406. Epub 2021 May 3.

Abstract

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of ~1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and background s; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (DNet). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. DNet exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that DNet can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the DNet model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

摘要

近年来,利用RGB-D信息进行显著目标检测(SOD)已得到广泛探索。然而,在使用RGB-D对真实人类活动场景中的SOD进行建模方面所做的工作相对较少。在本文中,我们通过对RGB-D SOD做出以下贡献来填补这一空白:1)我们精心收集了一个新的显著人物(SIP)数据集,该数据集由约1000张高分辨率图像组成,这些图像涵盖了来自不同视角、姿势、遮挡、光照和背景的各种真实世界场景;2)我们进行了一项大规模(也是迄今为止最全面的)基准测试,比较当代方法,该基准测试在该领域长期缺失,可作为未来研究的基线,并且我们系统地总结了32种流行模型,并在包含总共约97k张图像的七个数据集上对32个模型的18个部分进行了评估;3)我们提出了一种简单的通用架构,称为深度深度净化器网络(DNet)。它由一个深度净化器单元(DDU)和一个三流特征学习模块(FLM)组成,分别执行低质量深度图滤波和跨模态特征学习。这些组件形成一个嵌套结构,并经过精心设计以联合学习。DNet在所有考虑的五个指标上均超过了任何先前的竞争者,从而成为推动该领域研究的强大模型。我们还证明,DNet可用于从真实场景中高效提取显著目标掩码,以65帧/秒的速度在单个GPU上实现有效的背景更改应用。所有显著图、我们的新SIP数据集、DNet模型和评估工具均可在https://github.com/DengPingFan/D3NetBenchmark上公开获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验