用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):5034-5046. doi: 10.1109/TNNLS.2020.3026669. Epub 2021 Oct 27.

DOI:10.1109/TNNLS.2020.3026669

Abstract

Many computer vision tasks, such as monocular depth estimation and height estimation from a satellite orthophoto, have a common underlying goal, which is regression of dense continuous values for the pixels given a single image. We define them as dense continuous-value regression (DCR) tasks. Recent approaches based on deep convolutional neural networks significantly improve the performance of DCR tasks, particularly on pixelwise regression accuracy. However, it still remains challenging to simultaneously preserve the global structure and fine object details in complex scenes. In this article, we take advantage of the efficiency of Laplacian pyramid on representing multiscale contents to reconstruct high-quality signals for complex scenes. We design a Laplacian pyramid neural network (LAPNet), which consists of a Laplacian pyramid decoder (LPD) for signal reconstruction and an adaptive dense feature fusion (ADFF) module to fuse features from the input image. More specifically, we build an LPD to effectively express both global and local scene structures. In our LPD, the upper and lower levels, respectively, represent scene layouts and shape details. We introduce a residual refinement module to progressively complement high-frequency details for signal prediction at each level. To recover the signals at each individual level in the pyramid, an ADFF module is proposed to adaptively fuse multiscale image features for accurate prediction. We conduct comprehensive experiments to evaluate a number of variants of our model on three important DCR tasks, i.e., monocular depth estimation, single-image height estimation, and density map estimation for crowd counting. Experiments demonstrate that our method achieves new state-of-the-art performance in both qualitative and quantitative evaluation on the NYU-D V2 and KITTI for monocular depth estimation, the challenging Urban Semantic 3D (US3D) for satellite height estimation, and four challenging benchmarks for crowd counting. These results demonstrate that the proposed LAPNet is a universal and effective architecture for DCR problems.

摘要

许多计算机视觉任务，如单目深度估计和卫星正射影像的高度估计，都有一个共同的基本目标，即为单个图像的像素回归密集连续值。我们将其定义为密集连续值回归（DCR）任务。基于深度卷积神经网络的最新方法极大地提高了 DCR 任务的性能，特别是在像素级回归精度方面。然而，在复杂场景中同时保留全局结构和精细目标细节仍然具有挑战性。在本文中，我们利用拉普拉斯金字塔在表示多尺度内容方面的效率，为复杂场景重建高质量的信号。我们设计了一个拉普拉斯金字塔神经网络（LAPNet），它由一个拉普拉斯金字塔解码器（LPD）用于信号重建和一个自适应密集特征融合（ADFF）模块来融合输入图像的特征。具体来说，我们构建了一个 LPD 来有效地表达全局和局部场景结构。在我们的 LPD 中，上下层分别表示场景布局和形状细节。我们引入了一个残差细化模块，在每个级别上逐步补充高频细节以进行信号预测。为了在金字塔的每个单独级别恢复信号，我们提出了一个 ADFF 模块来自适应融合多尺度图像特征以进行准确预测。我们进行了全面的实验，在三个重要的 DCR 任务上评估了我们模型的许多变体，即单目深度估计、单图像高度估计和人群计数的密度图估计。实验表明，我们的方法在 NYU-D V2 和 KITTI 上的单目深度估计、具有挑战性的卫星高度估计 Urban Semantic 3D（US3D）以及人群计数的四个挑战性基准上，在定性和定量评估方面均取得了新的最先进的性能。这些结果表明，所提出的 LAPNet 是一种通用有效的 DCR 问题架构。

相似文献

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。

IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):5034-5046. doi: 10.1109/TNNLS.2020.3026669. Epub 2021 Oct 27.

DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation.DCPNet：用于单目深度估计的密集连接金字塔网络。

Sensors (Basel). 2021 Oct 13;21(20):6780. doi: 10.3390/s21206780.

LapUNet: a novel approach to monocular depth estimation using dynamic laplacian residual U-shape networks.LapUNet：一种使用动态拉普拉斯残差U型网络进行单目深度估计的新方法。

Sci Rep. 2024 Oct 9;14(1):23544. doi: 10.1038/s41598-024-74445-x.

Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.基于拉普拉斯图像金字塔和局部平面引导层的单目深度估计

Sensors (Basel). 2023 Jan 11;23(2):845. doi: 10.3390/s23020845.

Multimodal medical image fusion via laplacian pyramid and convolutional neural network reconstruction with local gradient energy strategy.基于拉普拉斯金字塔和卷积神经网络重建并采用局部梯度能量策略的多模态医学图像融合

Comput Biol Med. 2020 Nov;126:104048. doi: 10.1016/j.compbiomed.2020.104048. Epub 2020 Oct 8.

An effective modular approach for crowd counting in an image using convolutional neural networks.基于卷积神经网络的图像人群计数的有效模块化方法。

Sci Rep. 2022 Apr 6;12(1):5795. doi: 10.1038/s41598-022-09685-w.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT：基于轻量级视觉Transformer 的实时单目深度估计。

Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

Depth Estimation from Light Field Geometry Using Convolutional Neural Networks.基于卷积神经网络的光场几何深度估计

Sensors (Basel). 2021 Sep 10;21(18):6061. doi: 10.3390/s21186061.

Improved dual-scale residual network for image super-resolution.改进的双尺度残差网络的图像超分辨率。

Neural Netw. 2020 Dec;132:84-95. doi: 10.1016/j.neunet.2020.08.008. Epub 2020 Aug 19.

Crowd Counting Based on Multiscale Spatial Guided Perception Aggregation Network.基于多尺度空间引导感知聚合网络的人群计数

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17465-17478. doi: 10.1109/TNNLS.2023.3304348. Epub 2024 Dec 2.

用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):5034-5046. doi: 10.1109/TNNLS.2020.3026669. Epub 2021 Oct 27.

DOI:10.1109/TNNLS.2020.3026669

PMID:33290230

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.

出版信息

相似文献

用于复杂场景密集连续值回归的拉普拉斯金字塔神经网络。

Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.

出版信息

相似文献