重新思考RGB-D显著目标检测：模型、数据集和大规模基准

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.

作者信息

Fan Deng-Ping, Lin Zheng, Zhang Zhao, Zhu Menglong, Cheng Ming-Ming

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2075-2089. doi: 10.1109/TNNLS.2020.2996406. Epub 2021 May 3.

DOI:10.1109/TNNLS.2020.2996406

PMID:32491986

Abstract

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of ~1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and background s; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (DNet). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. DNet exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that DNet can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the DNet model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

摘要

近年来，利用RGB-D信息进行显著目标检测（SOD）已得到广泛探索。然而，在使用RGB-D对真实人类活动场景中的SOD进行建模方面所做的工作相对较少。在本文中，我们通过对RGB-D SOD做出以下贡献来填补这一空白：1）我们精心收集了一个新的显著人物（SIP）数据集，该数据集由约1000张高分辨率图像组成，这些图像涵盖了来自不同视角、姿势、遮挡、光照和背景的各种真实世界场景；2）我们进行了一项大规模（也是迄今为止最全面的）基准测试，比较当代方法，该基准测试在该领域长期缺失，可作为未来研究的基线，并且我们系统地总结了32种流行模型，并在包含总共约97k张图像的七个数据集上对32个模型的18个部分进行了评估；3）我们提出了一种简单的通用架构，称为深度深度净化器网络（DNet）。它由一个深度净化器单元（DDU）和一个三流特征学习模块（FLM）组成，分别执行低质量深度图滤波和跨模态特征学习。这些组件形成一个嵌套结构，并经过精心设计以联合学习。DNet在所有考虑的五个指标上均超过了任何先前的竞争者，从而成为推动该领域研究的强大模型。我们还证明，DNet可用于从真实场景中高效提取显著目标掩码，以65帧/秒的速度在单个GPU上实现有效的背景更改应用。所有显著图、我们的新SIP数据集、DNet模型和评估工具均可在https://github.com/DengPingFan/D3NetBenchmark上公开获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

重新思考RGB-D显著目标检测：模型、数据集和大规模基准

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.

作者信息

出版信息

相似文献

引用本文的文献

重新思考RGB-D显著目标检测：模型、数据集和大规模基准

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.

作者信息

出版信息

相似文献

引用本文的文献