• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于生成放射学报告的眼动引导跨模态对齐网络

Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation.

作者信息

Peng Peixi, Fan Wanshu, Shen Yue, Liu Wenfei, Yang Xin, Zhang Qiang, Wei Xiaopeng, Zhou Dongsheng

出版信息

IEEE J Biomed Health Inform. 2024 Dec;28(12):7406-7419. doi: 10.1109/JBHI.2024.3422168. Epub 2024 Dec 5.

DOI:10.1109/JBHI.2024.3422168
PMID:38995704
Abstract

The potential benefits of automatic radiology report generation, such as reducing misdiagnosis rates and enhancing clinical diagnosis efficiency, are significant. However, existing data-driven methods lack essential medical prior knowledge, which hampers their performance. Moreover, establishing global correspondences between radiology images and related reports, while achieving local alignments between images correlated with prior knowledge and text, remains a challenging task. To address these shortcomings, we introduce a novel Eye Gaze Guided Cross-modal Alignment Network (EGGCA-Net) for generating accurate medical reports. Our approach incorporates prior knowledge from radiologists' Eye Gaze Region (EGR) to refine the fidelity and comprehensibility of report generation. Specifically, we design a Dual Fine-Grained Branch (DFGB) and a Multi-Task Branch (MTB) to collaboratively ensure the alignment of visual and textual semantics across multiple levels. To establish fine-grained alignment between EGR-related images and sentences, we introduce the Sentence Fine-grained Prototype Module (SFPM) within DFGB to capture cross-modal information at different levels. Additionally, to learn the alignment of EGR-related image topics, we introduce the Multi-task Feature Fusion Module (MFFM) within MTB to refine the encoder output information. Finally, a specifically designed label matching mechanism is designed to generate reports that are consistent with the anticipated disease states. The experimental outcomes indicate that the introduced methodology surpasses previous advanced techniques, yielding enhanced performance on two extensively used benchmark datasets: Open-i and MIMIC-CXR.

摘要

自动生成放射学报告具有显著的潜在益处,例如降低误诊率和提高临床诊断效率。然而,现有的数据驱动方法缺乏必要的医学先验知识,这阻碍了它们的性能。此外,在放射学图像与相关报告之间建立全局对应关系,同时在与先验知识相关的图像和文本之间实现局部对齐,仍然是一项具有挑战性的任务。为了解决这些缺点,我们引入了一种新颖的眼动引导跨模态对齐网络(EGGCA-Net)来生成准确的医学报告。我们的方法纳入了放射科医生眼动区域(EGR)的先验知识,以提高报告生成的保真度和可理解性。具体而言,我们设计了一个双细粒度分支(DFGB)和一个多任务分支(MTB),以协同确保跨多个层次的视觉和文本语义对齐。为了在与EGR相关的图像和句子之间建立细粒度对齐,我们在DFGB中引入了句子细粒度原型模块(SFPM),以在不同层次捕获跨模态信息。此外,为了学习与EGR相关的图像主题的对齐,我们在MTB中引入了多任务特征融合模块(MFFM),以细化编码器输出信息。最后,设计了一种专门的标签匹配机制,以生成与预期疾病状态一致的报告。实验结果表明,所提出的方法优于先前的先进技术,在两个广泛使用的基准数据集Open-i和MIMIC-CXR上取得了更好的性能。

相似文献

1
Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation.用于生成放射学报告的眼动引导跨模态对齐网络
IEEE J Biomed Health Inform. 2024 Dec;28(12):7406-7419. doi: 10.1109/JBHI.2024.3422168. Epub 2024 Dec 5.
2
Visual prior-based cross-modal alignment network for radiology report generation.基于视觉先验的跨模态对齐网络在放射科报告生成中的应用。
Comput Biol Med. 2023 Nov;166:107522. doi: 10.1016/j.compbiomed.2023.107522. Epub 2023 Sep 22.
3
Memory-Based Cross-Modal Semantic Alignment Network for Radiology Report Generation.用于放射学报告生成的基于记忆的跨模态语义对齐网络
IEEE J Biomed Health Inform. 2024 Jul;28(7):4145-4156. doi: 10.1109/JBHI.2024.3393018. Epub 2024 Jul 2.
4
Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning.基于句子级图像-语言对比学习的多粒度放射学报告生成
IEEE Trans Med Imaging. 2024 Jul;43(7):2657-2669. doi: 10.1109/TMI.2024.3372638. Epub 2024 Jul 1.
5
PhraseAug: An Augmented Medical Report Generation Model With Phrasebook.短语增强:一种基于短语手册的增强型医学报告生成模型。
IEEE Trans Med Imaging. 2024 Dec;43(12):4211-4223. doi: 10.1109/TMI.2024.3416190. Epub 2024 Dec 2.
6
Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告:自适应多级多关注方法。
Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.
7
S3-Net: A Self-Supervised Dual-Stream Network for Radiology Report Generation.S3-Net:一种用于放射科报告生成的自监督双流网络。
IEEE J Biomed Health Inform. 2024 Mar;28(3):1448-1459. doi: 10.1109/JBHI.2023.3345932. Epub 2024 Mar 6.
8
Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.无监督视觉-文本关联学习与细粒度语义对齐。
IEEE Trans Cybern. 2022 May;52(5):3669-3683. doi: 10.1109/TCYB.2020.3015084. Epub 2022 May 19.
9
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval.用于文本到图像人物检索的文本引导图像恢复与语义增强
Neural Netw. 2025 Apr;184:107028. doi: 10.1016/j.neunet.2024.107028. Epub 2024 Dec 16.
10
Intensive vision-guided network for radiology report generation.基于密集视觉引导的放射科报告生成网络。
Phys Med Biol. 2024 Feb 5;69(4). doi: 10.1088/1361-6560/ad1995.

引用本文的文献

1
Eye-Guided Multimodal Fusion: Toward an Adaptive Learning Framework Using Explainable Artificial Intelligence.眼动引导的多模态融合:迈向使用可解释人工智能的自适应学习框架
Sensors (Basel). 2025 Jul 24;25(15):4575. doi: 10.3390/s25154575.
2
Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption.架起人类与机器智能之间的桥梁:逆向工程放射科医生建立临床信任和实现应用的意图。
Comput Struct Biotechnol J. 2024 Nov 8;24:711-723. doi: 10.1016/j.csbj.2024.11.012. eCollection 2024 Dec.