• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于跨模态交叉注意力的多光谱遥感图像目标检测

Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention.

作者信息

Zhao Pujie, Ye Xia, Du Ziang

机构信息

Xi'an Research Institute of High-Tech, Xi'an 710025, China.

出版信息

Sensors (Basel). 2024 Jun 24;24(13):4098. doi: 10.3390/s24134098.

DOI:10.3390/s24134098
PMID:39000877
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11244361/
Abstract

In complex environments a single visible image is not good enough to perceive the environment, this paper proposes a novel dual-stream real-time detector designed for target detection in extreme environments such as nighttime and fog, which is able to efficiently utilise both visible and infrared images to achieve Fast All-Weatherenvironment sensing (FAWDet). Firstly, in order to allow the network to process information from different modalities simultaneously, this paper expands the state-of-the-art end-to-end detector YOLOv8, the backbone is expanded in parallel as a dual stream. Then, for purpose of avoid information loss in the process of network deepening, a cross-modal feature enhancement module is designed in this study, which enhances each modal feature by cross-modal attention mechanisms, thus effectively avoiding information loss and improving the detection capability of small targets. In addition, for the significant differences between modal features, this paper proposes a three-stage fusion strategy to optimise the feature integration through the fusion of spatial, channel and overall dimensions. It is worth mentioning that the cross-modal feature fusion module adopts an end-to-end training approach. Extensive experiments on two datasets validate that the proposed method achieves state-of-the-art performance in detecting small targets. The cross-modal real-time detector in this study not only demonstrates excellent stability and robust detection performance, but also provides a new solution for target detection techniques in extreme environments.

摘要

在复杂环境中,单张可见光图像不足以感知环境,本文提出了一种新颖的双流实时检测器,专为夜间和雾天等极端环境下的目标检测而设计,它能够有效利用可见光和红外图像实现快速全天气环境感知(FAWDet)。首先,为了使网络能够同时处理来自不同模态的信息,本文扩展了当前最先进的端到端检测器YOLOv8,将主干网络并行扩展为双流。然后,为了避免在网络加深过程中信息丢失,本研究设计了一个跨模态特征增强模块,通过跨模态注意力机制增强每个模态特征,从而有效避免信息丢失并提高小目标的检测能力。此外,针对模态特征之间的显著差异,本文提出了一种三阶段融合策略,通过空间、通道和整体维度的融合来优化特征整合。值得一提的是,跨模态特征融合模块采用端到端训练方法。在两个数据集上进行的大量实验验证了所提方法在检测小目标方面达到了当前最先进的性能。本研究中的跨模态实时检测器不仅展示了出色的稳定性和鲁棒的检测性能,还为极端环境下的目标检测技术提供了一种新的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/792fabe03b86/sensors-24-04098-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/58524cc4d582/sensors-24-04098-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/f5b3a2438397/sensors-24-04098-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/1de36a9cf261/sensors-24-04098-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/f79792e91374/sensors-24-04098-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/47819efb2b57/sensors-24-04098-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/5478551583ae/sensors-24-04098-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/ecf277c5e87a/sensors-24-04098-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/c665057d149f/sensors-24-04098-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/7974806ce233/sensors-24-04098-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/792fabe03b86/sensors-24-04098-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/58524cc4d582/sensors-24-04098-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/f5b3a2438397/sensors-24-04098-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/1de36a9cf261/sensors-24-04098-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/f79792e91374/sensors-24-04098-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/47819efb2b57/sensors-24-04098-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/5478551583ae/sensors-24-04098-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/ecf277c5e87a/sensors-24-04098-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/c665057d149f/sensors-24-04098-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/7974806ce233/sensors-24-04098-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1093/11244361/792fabe03b86/sensors-24-04098-g010.jpg

相似文献

1
Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention.基于跨模态交叉注意力的多光谱遥感图像目标检测
Sensors (Basel). 2024 Jun 24;24(13):4098. doi: 10.3390/s24134098.
2
OIF-Net: An Optical Flow Registration-Based PET/MR Cross-Modal Interactive Fusion Network for Low-Count Brain PET Image Denoising.OIF-Net:一种基于光流配准的 PET/MR 跨模态交互融合网络,用于低计数脑 PET 图像去噪。
IEEE Trans Med Imaging. 2024 Apr;43(4):1554-1567. doi: 10.1109/TMI.2023.3342809. Epub 2024 Apr 3.
3
IV-YOLO: A Lightweight Dual-Branch Object Detection Network.IV-YOLO:一种轻量级双分支目标检测网络。
Sensors (Basel). 2024 Sep 24;24(19):6181. doi: 10.3390/s24196181.
4
HP-YOLOv8: High-Precision Small Object Detection Algorithm for Remote Sensing Images.HP-YOLOv8:用于遥感图像的高精度小目标检测算法
Sensors (Basel). 2024 Jul 26;24(15):4858. doi: 10.3390/s24154858.
5
Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning.基于模态感知注意力网络和竞争学习的 RGB-T 视频目标跟踪
Sensors (Basel). 2020 Jan 10;20(2):393. doi: 10.3390/s20020393.
6
OD-YOLO: Robust Small Object Detection Model in Remote Sensing Image with a Novel Multi-Scale Feature Fusion.OD-YOLO:基于新型多尺度特征融合的遥感图像稳健小目标检测模型
Sensors (Basel). 2024 Jun 3;24(11):3596. doi: 10.3390/s24113596.
7
Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification.基于图采样的多流增强网络用于可见光-红外行人重识别
Sensors (Basel). 2023 Sep 18;23(18):7948. doi: 10.3390/s23187948.
8
Bilateral Cross-Modal Fusion Network for Robot Grasp Detection.双边跨模态融合网络的机器人抓取检测。
Sensors (Basel). 2023 Mar 22;23(6):3340. doi: 10.3390/s23063340.
9
Concrete Highway Crack Detection Based on Visible Light and Infrared Silicate Spectrum Image Fusion.基于可见光与红外硅酸盐光谱图像融合的混凝土公路裂缝检测
Sensors (Basel). 2024 Apr 26;24(9):2759. doi: 10.3390/s24092759.
10
MAFF-Net: Multi-Attention Guided Feature Fusion Network for Change Detection in Remote Sensing Images.MAFF-Net:用于遥感图像变化检测的多注意力引导特征融合网络。
Sensors (Basel). 2022 Jan 24;22(3):888. doi: 10.3390/s22030888.

本文引用的文献

1
Cognition-Driven Structural Prior for Instance-Dependent Label Transition Matrix Estimation.用于实例相关标签转移矩阵估计的认知驱动结构先验
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3730-3743. doi: 10.1109/TNNLS.2023.3347633. Epub 2025 Feb 6.
2
Lightweight aerial image object detection algorithm based on improved YOLOv5s.基于改进 YOLOv5s 的轻量级空中图像目标检测算法。
Sci Rep. 2023 May 15;13(1):7817. doi: 10.1038/s41598-023-34892-4.
3
Dual-YOLO Architecture from Infrared and Visible Images for Object Detection.
基于红外和可见光图像的双 YOLO 目标检测架构。
Sensors (Basel). 2023 Mar 8;23(6):2934. doi: 10.3390/s23062934.
4
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.空间金字塔池化在深度卷积网络中的视觉识别。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.