ECAMP：以实体为中心的上下文感知医学视觉语言预训练

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training.

作者信息

Wang Rongsheng, Yao Qingsong, Jiang Zihang, Lai Haoran, He Zhiyang, Tao Xiaodong, Zhou S Kevin

机构信息

School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei Anhui, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advance Research, USTC, Suzhou Jiangsu, 215123, China; Anhui IFLYTEK CO., Ltd., China.

Stanford University, Palo Alto, CA, 94025, United States.

出版信息

Med Image Anal. 2025 Jun 26;105:103690. doi: 10.1016/j.media.2025.103690.

DOI:10.1016/j.media.2025.103690

PMID:40602209

Abstract

Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent linguistic complexity and imbalanced issue within medical reports, as well as the complex cross-modality contextual relationships between texts and images. To close this gap, we propose a novel Entity-centered Context-aware Medical Vision-language Pre-training (ECAMP) framework, which establishes a more entity-centered, context-sensitive, and balanced understanding of medical reports to effectively pre-train the vision encoder. We first distill entity-centered context from medical reports utilizing large language models, enabling ECAMP to draw more precise supervision from the text modality. By further incorporating entity-aware re-balanced factor and descriptor masking strategies into masked language modeling, ECAMP significantly enhances the knowledge of entities within the reports. A context-guided super-resolution task is proposed alongside a multi-scale context fusion design to improve the semantic integration of both coarse and fine-level image representations, which prompts better performance for multi-scale downstream applications. ECAMP integrates these innovations together, leading to significant performance leaps over current state-of-the-art methods and establish a new standard for cross-modality pre-training in medical imaging. The effectiveness of ECAMP is demonstrated by extensive experiments on various domains and organs, which achieves cutting-edge results on multiple tasks including classification, segmentation, and detection across 5 public chest X-ray and 4 fundoscopy datasets respectively.

摘要

尽管医学视觉语言预训练取得了显著进展，但现有方法在很大程度上忽略了医学报告中固有的语言复杂性和不平衡问题，以及文本与图像之间复杂的跨模态上下文关系。为了弥补这一差距，我们提出了一种新颖的以实体为中心的上下文感知医学视觉语言预训练（ECAMP）框架，该框架建立了一种更加以实体为中心、上下文敏感且平衡的医学报告理解方式，以有效地预训练视觉编码器。我们首先利用大语言模型从医学报告中提取以实体为中心的上下文，使ECAMP能够从文本模态中获得更精确的监督。通过进一步将实体感知重新平衡因子和描述符掩码策略纳入掩码语言建模，ECAMP显著增强了报告中实体的知识。同时提出了一个上下文引导的超分辨率任务以及多尺度上下文融合设计，以改善粗粒度和细粒度图像表示的语义整合，从而在多尺度下游应用中实现更好的性能。ECAMP将这些创新整合在一起，相比当前的最先进方法实现了显著的性能飞跃，并为医学成像中的跨模态预训练树立了新的标准。在各种领域和器官上进行的广泛实验证明了ECAMP的有效性，它在分别跨越5个公共胸部X光和4个眼底镜检查数据集的分类、分割和检测等多个任务上取得了前沿成果。

相似文献

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training.ECAMP：以实体为中心的上下文感知医学视觉语言预训练

Med Image Anal. 2025 Jun 26;105:103690. doi: 10.1016/j.media.2025.103690.

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。

Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.

HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer.心脏：使用异构关系感知转换器学习更好的 EHR 数据表示。

J Biomed Inform. 2024 Nov;159:104741. doi: 10.1016/j.jbi.2024.104741. Epub 2024 Oct 29.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Global and Local Semantic Completion Learning for Vision-Language Pre-Training.

IEEE Trans Pattern Anal Mach Intell. 2025 Aug 6;PP. doi: 10.1109/TPAMI.2025.3596394.

Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型：数据集开发与验证研究

JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.

General 3D Vision-Language Model With Fast Rendering and Pre-Training Vision-Language Alignment.具有快速渲染和预训练视觉语言对齐的通用3D视觉语言模型。

IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):7352-7368. doi: 10.1109/TPAMI.2025.3566593.

Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练

J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.

Short-Term Memory Impairment短期记忆障碍

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

引用本文的文献

Vision-language foundation models for medical imaging: a review of current practices and innovations.用于医学成像的视觉语言基础模型：当前实践与创新综述

Biomed Eng Lett. 2025 Jun 6;15(5):809-830. doi: 10.1007/s13534-025-00484-6. eCollection 2025 Sep.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ECAMP：以实体为中心的上下文感知医学视觉语言预训练

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献