用于视觉识别的深度高分辨率表征学习

Deep High-Resolution Representation Learning for Visual Recognition.

作者信息

Wang Jingdong, Sun Ke, Cheng Tianheng, Jiang Borui, Deng Chaorui, Zhao Yang, Liu Dong, Mu Yadong, Tan Mingkui, Wang Xinggang, Liu Wenyu, Xiao Bin

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3349-3364. doi: 10.1109/TPAMI.2020.2983686. Epub 2021 Sep 2.

DOI:10.1109/TPAMI.2020.2983686

PMID:32248092

Abstract

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet.

摘要

高分辨率表示对于诸如人体姿态估计、语义分割和目标检测等位置敏感的视觉问题至关重要。现有的最先进框架首先通过一个由串联连接高分辨率到低分辨率卷积组成的子网（例如ResNet、VGGNet）将输入图像编码为低分辨率表示，然后从编码后的低分辨率表示中恢复高分辨率表示。相反，我们提出的名为高分辨率网络（HRNet）的网络在整个过程中都保持高分辨率表示。它有两个关键特性：（i）并行连接高分辨率到低分辨率的卷积流，以及（ii）跨分辨率反复交换信息。其好处是得到的表示在语义上更丰富，在空间上更精确。我们在包括人体姿态估计、语义分割和目标检测在内的广泛应用中展示了所提出的HRNet的优越性，这表明HRNet是解决计算机视觉问题的更强有力的主干。所有代码可在https://github.com/HRNet获取。

相似文献

Deep High-Resolution Representation Learning for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3349-3364. doi: 10.1109/TPAMI.2020.2983686. Epub 2021 Sep 2.

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking.

Sensors (Basel). 2020 Aug 26;20(17):4807. doi: 10.3390/s20174807.

Contextual Transformer Networks for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1489-1500. doi: 10.1109/TPAMI.2022.3164083. Epub 2023 Jan 6.

A Holistically-Guided Decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):11390-11406. doi: 10.1109/TPAMI.2021.3114342. Epub 2023 Sep 5.

Deep High-Resolution Representation Learning for Cross-Resolution Person Re-Identification.

IEEE Trans Image Process. 2021;30:8913-8925. doi: 10.1109/TIP.2021.3120054. Epub 2021 Oct 28.

Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery.

Sensors (Basel). 2021 Mar 11;21(6):1983. doi: 10.3390/s21061983.

Automatic crack segmentation using deep high-resolution representation learning.

Appl Opt. 2021 Jul 20;60(21):6080-6090. doi: 10.1364/AO.423406.

Human Pose Estimation Based on Efficient and Lightweight High-Resolution Network (EL-HRNet).

Sensors (Basel). 2024 Jan 9;24(2):0. doi: 10.3390/s24020396.

Learning Enriched Features for Fast Image Restoration and Enhancement.

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1934-1948. doi: 10.1109/TPAMI.2022.3167175. Epub 2023 Jan 6.

Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry.

Med Image Anal. 2021 May;70:101994. doi: 10.1016/j.media.2021.101994. Epub 2021 Feb 7.

引用本文的文献

DE-HRNet: Detail enhanced high-resolution network for human pose estimation.

PLoS One. 2025 Sep 2;20(9):e0325540. doi: 10.1371/journal.pone.0325540. eCollection 2025.

A lightweight small object detection model for UAV images based on deep semantic integration.

Sci Rep. 2025 Aug 29;15(1):31888. doi: 10.1038/s41598-025-16878-6.

X-FASNet: cross-scale feature-aware with self-attention network for cognitive decline assessment in Alzheimer's disease.

Front Neurol. 2025 Aug 12;16:1630838. doi: 10.3389/fneur.2025.1630838. eCollection 2025.

Enhanced Disease Segmentation in Pear Leaves via Edge-Aware Multi-Scale Attention Network.

Sensors (Basel). 2025 Aug 14;25(16):5058. doi: 10.3390/s25165058.

Neonatal pose estimation in the unaltered clinical environment with fusion of RGB, depth and IR images.

NPJ Digit Med. 2025 Aug 22;8(1):539. doi: 10.1038/s41746-025-01929-z.

30 Years of simultaneous crop & land cover land use maps for Middle Rio Grande from 1994 to 2024.

Sci Data. 2025 Aug 22;12(1):1462. doi: 10.1038/s41597-025-05771-6.

Steel surface defect segmentation with SME-DeeplabV3.

PLoS One. 2025 Aug 14;20(8):e0329628. doi: 10.1371/journal.pone.0329628. eCollection 2025.

Contactless Vital Sign Monitoring: A Review Towards Multi-Modal Multi-Task Approaches.

Sensors (Basel). 2025 Aug 4;25(15):4792. doi: 10.3390/s25154792.

TOSQ: Transparent Object Segmentation via Query-Based Dictionary Lookup with Transformers.

Sensors (Basel). 2025 Jul 30;25(15):4700. doi: 10.3390/s25154700.

Multimodal fusion approach for sports injury prevention and pose keypoint detection.

PLoS One. 2025 Aug 11;20(8):e0327911. doi: 10.1371/journal.pone.0327911. eCollection 2025.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于视觉识别的深度高分辨率表征学习

Deep High-Resolution Representation Learning for Visual Recognition.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献