Suppr超能文献

多模态融合网络中模态利用的调控

Regulating Modality Utilization within Multimodal Fusion Networks.

作者信息

Singh Saurav, Saber Eli, Markopoulos Panos P, Heard Jamison

机构信息

Department of Electrical & Microelectronic Engineering, Rochester Institute of Technology, Rochester, NY 14623, USA.

Department of Electrical & Computer Engineering and Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA.

出版信息

Sensors (Basel). 2024 Sep 19;24(18):6054. doi: 10.3390/s24186054.

Abstract

Multimodal fusion networks play a pivotal role in leveraging diverse sources of information for enhanced machine learning applications in aerial imagery. However, current approaches often suffer from a bias towards certain modalities, diminishing the potential benefits of multimodal data. This paper addresses this issue by proposing a novel modality utilization-based training method for multimodal fusion networks. The method aims to guide the network's utilization on its input modalities, ensuring a balanced integration of complementary information streams, effectively mitigating the overutilization of dominant modalities. The method is validated on multimodal aerial imagery classification and image segmentation tasks, effectively maintaining modality utilization within ±10% of the user-defined target utilization and demonstrating the versatility and efficacy of the proposed method across various applications. Furthermore, the study explores the robustness of the fusion networks against noise in input modalities, a crucial aspect in real-world scenarios. The method showcases better noise robustness by maintaining performance amidst environmental changes affecting different aerial imagery sensing modalities. The network trained with 75.0% EO utilization achieves significantly better accuracy (81.4%) in noisy conditions (noise variance = 0.12) compared to traditional training methods with 99.59% EO utilization (73.7%). Additionally, it maintains an average accuracy of 85.0% across different noise levels, outperforming the traditional method's average accuracy of 81.9%. Overall, the proposed approach presents a significant step towards harnessing the full potential of multimodal data fusion in diverse machine learning applications such as robotics, healthcare, satellite imagery, and defense applications.

摘要

多模态融合网络在利用多种信息源以增强航空图像中的机器学习应用方面发挥着关键作用。然而,当前的方法往往存在对某些模态的偏向,从而削弱了多模态数据的潜在优势。本文通过提出一种新颖的基于模态利用的多模态融合网络训练方法来解决这一问题。该方法旨在指导网络对其输入模态的利用,确保互补信息流的平衡整合,有效减轻主导模态的过度利用。该方法在多模态航空图像分类和图像分割任务上得到了验证,能够有效地将模态利用率维持在用户定义的目标利用率的±10%以内,并证明了所提方法在各种应用中的通用性和有效性。此外,该研究还探讨了融合网络对输入模态噪声的鲁棒性,这是现实场景中的一个关键方面。该方法通过在影响不同航空图像传感模态的环境变化中保持性能,展现出了更好的噪声鲁棒性。与采用99.59%的光电(EO)利用率的传统训练方法相比,采用75.0%的EO利用率训练的网络在噪声条件(噪声方差 = 0.12)下实现了显著更高的准确率(81.4%)。此外,它在不同噪声水平下保持了85.0%的平均准确率,优于传统方法81.9%的平均准确率。总体而言,所提出的方法朝着在机器人技术、医疗保健、卫星图像和国防应用等各种机器学习应用中充分发挥多模态数据融合的潜力迈出了重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d036/11435562/2b34b1d5cf5a/sensors-24-06054-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验