• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于跨模态上下文注意力模型的RGB-FIR多模态行人检测

RGB-FIR Multimodal Pedestrian Detection with Cross-Modality Context Attentional Model.

作者信息

Wang Han, Jin Lei, Wang Guangcheng, Liu Wenjie, Shi Quan, Hou Yingyan, Liu Jiali

机构信息

School of Transportation and Civil Engineering, Nantong University, Nantong 226019, China.

Target Key Laboratory of Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.

出版信息

Sensors (Basel). 2025 Jun 20;25(13):3854. doi: 10.3390/s25133854.

DOI:10.3390/s25133854
PMID:40648113
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12252401/
Abstract

Pedestrian detection is an important research topic in the field of visual cognition and autonomous driving systems. The proposal of the YOLO model has significantly improved the speed and accuracy of detection. To achieve full day detection performance, multimodal YOLO models based on RGB-FIR image pairs have become a research hotspot. Existing work has focused on the design of fusion modules after feature extraction of RGB and FIR branch backbone networks, achieving a multimodal backbone network framework based on back-end fusion. However, these methods overlook the complementarity and prior knowledge between modalities and scales in the front-end raw feature extraction of RGB and FIR branch backbone networks. As a result, the performance of the backend fusion framework largely depends on the representation ability of the raw features of each modality in the front-end. This paper proposes a novel RGB-FIR multimodal backbone network framework based on a cross-modality context attentional model (CCAM). Different from the existing works, a multi-level fusion framework is designed. At the front-end of the RGB-FIR parallel backbone network, the CCAM model is constructed for the raw features of each scale. The RGB-FIR feature fusion results of the lower-level features of the RGB and FIR branch backbone networks are fully utilized to optimize the spatial weight of the upper level RGB and FIR features, to achieve cross-modality and cross-scale complementarity between adjacent scale feature extraction modules. At the back-end of the RGB-FIR parallel network, a channel-space joint attention model (CBAM) and self-attention models are combined to obtain the final RGB-FIR fusion features at each scale for those RGB and FIR features optimized by CCAM. Compared with the current RGB-FIR multimodal YOLO model, comparative experiments on different performance evaluation indicators on multiple RGB-FIR public datasets indicate that this method can significantly enhance the accuracy and robustness of pedestrian detection.

摘要

行人检测是视觉认知和自动驾驶系统领域的一个重要研究课题。YOLO模型的提出显著提高了检测的速度和准确性。为了实现全天候检测性能,基于RGB-FIR图像对的多模态YOLO模型已成为研究热点。现有工作主要集中在RGB和FIR分支主干网络特征提取后的融合模块设计上,实现了基于后端融合的多模态主干网络框架。然而,这些方法忽视了RGB和FIR分支主干网络前端原始特征提取中模态和尺度之间的互补性和先验知识。因此,后端融合框架的性能在很大程度上取决于前端各模态原始特征的表示能力。本文提出了一种基于跨模态上下文注意力模型(CCAM)的新型RGB-FIR多模态主干网络框架。与现有工作不同,设计了一个多层次融合框架。在RGB-FIR并行主干网络的前端,针对每个尺度的原始特征构建CCAM模型。充分利用RGB和FIR分支主干网络较低层特征的RGB-FIR特征融合结果,优化上层RGB和FIR特征的空间权重,以实现相邻尺度特征提取模块之间的跨模态和跨尺度互补性。在RGB-FIR并行网络的后端,将通道空间联合注意力模型(CBAM)和自注意力模型相结合,为经过CCAM优化的RGB和FIR特征获取每个尺度的最终RGB-FIR融合特征。与当前的RGB-FIR多模态YOLO模型相比,在多个RGB-FIR公共数据集上针对不同性能评估指标进行的对比实验表明,该方法能够显著提高行人检测的准确性和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/5a4fcb1ae616/sensors-25-03854-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f0cc74e7918c/sensors-25-03854-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f7f6dbd3aadd/sensors-25-03854-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/99b80c2b1949/sensors-25-03854-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/4ea2c517d0cb/sensors-25-03854-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/9ecd62f7dd60/sensors-25-03854-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f41b5088be27/sensors-25-03854-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/c2953b8a91c9/sensors-25-03854-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/1ddb5e841e44/sensors-25-03854-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f2624c6a5994/sensors-25-03854-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/6e6e6dd209b1/sensors-25-03854-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f71bcf9e36a6/sensors-25-03854-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/a11d4b91a0b2/sensors-25-03854-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/5a4fcb1ae616/sensors-25-03854-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f0cc74e7918c/sensors-25-03854-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f7f6dbd3aadd/sensors-25-03854-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/99b80c2b1949/sensors-25-03854-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/4ea2c517d0cb/sensors-25-03854-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/9ecd62f7dd60/sensors-25-03854-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f41b5088be27/sensors-25-03854-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/c2953b8a91c9/sensors-25-03854-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/1ddb5e841e44/sensors-25-03854-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f2624c6a5994/sensors-25-03854-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/6e6e6dd209b1/sensors-25-03854-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/f71bcf9e36a6/sensors-25-03854-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/a11d4b91a0b2/sensors-25-03854-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b859/12252401/5a4fcb1ae616/sensors-25-03854-g013.jpg

相似文献

1
RGB-FIR Multimodal Pedestrian Detection with Cross-Modality Context Attentional Model.基于跨模态上下文注意力模型的RGB-FIR多模态行人检测
Sensors (Basel). 2025 Jun 20;25(13):3854. doi: 10.3390/s25133854.
2
Short-Term Memory Impairment短期记忆障碍
3
TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection.TFDet:用于RGB-T行人检测的目标感知融合
IEEE Trans Neural Netw Learn Syst. 2024 Aug 23;PP. doi: 10.1109/TNNLS.2024.3443455.
4
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
5
Influence of early through late fusion on pancreas segmentation from imperfectly registered multimodal magnetic resonance imaging.早期至晚期融合对来自配准不完善的多模态磁共振成像的胰腺分割的影响。
J Med Imaging (Bellingham). 2025 Mar;12(2):024008. doi: 10.1117/1.JMI.12.2.024008. Epub 2025 Apr 26.
6
Develop intelligent waste bin prototype based on fusion feature recognition of sounds and RGB images.基于声音和RGB图像融合特征识别开发智能垃圾桶原型。
Waste Manag. 2025 Aug 1;204:114959. doi: 10.1016/j.wasman.2025.114959. Epub 2025 Jun 18.
7
DASNet a dual branch multi level attention sheep counting network.DASNet是一种双分支多级注意力羊只计数网络。
Sci Rep. 2025 Jul 2;15(1):23228. doi: 10.1038/s41598-025-97929-w.
8
Systemic Inflammatory Response Syndrome全身炎症反应综合征
9
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
10
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

本文引用的文献

1
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.统一卷积与自注意力机制用于视觉识别的UniFormer
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12581-12600. doi: 10.1109/TPAMI.2023.3282631. Epub 2023 Sep 5.
2
Dual-YOLO Architecture from Infrared and Visible Images for Object Detection.基于红外和可见光图像的双 YOLO 目标检测架构。
Sensors (Basel). 2023 Mar 8;23(6):2934. doi: 10.3390/s23062934.
3
Deep Learning for Person Re-Identification: A Survey and Outlook.用于行人重识别的深度学习:综述与展望
IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2872-2893. doi: 10.1109/TPAMI.2021.3054775. Epub 2022 May 5.
4
Ratio-and-Scale-Aware YOLO for Pedestrian Detection.比尺度感知 YOLO 用于行人检测。
IEEE Trans Image Process. 2021;30:934-947. doi: 10.1109/TIP.2020.3039574. Epub 2020 Dec 8.
5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
6
Fast Feature Pyramids for Object Detection.快速目标检测特征金字塔。
IEEE Trans Pattern Anal Mach Intell. 2014 Aug;36(8):1532-45. doi: 10.1109/TPAMI.2014.2300479.