模仿：临床先验引导的分层视觉语言预训练

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training.

作者信息

Liu Che, Cheng Sibo, Shi Miaojing, Shah Anand, Bai Wenjia, Arcucci Rossella

出版信息

IEEE Trans Med Imaging. 2025 Jan;44(1):519-529. doi: 10.1109/TMI.2024.3449690. Epub 2025 Jan 2.

DOI:10.1109/TMI.2024.3449690

Abstract

In medical Vision-Language Pre-training (VLP), significant work focuses on extracting text and image features from clinical reports and medical images. Yet, existing methods may overlooked the potential of the natural hierarchical structure in clinical reports, typically divided into 'findings' for description and 'impressions' for conclusions. Current VLP approaches tend to oversimplify these reports into a single entity or fragmented tokens, ignoring this structured format. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Experimental results show benefits of using hierarchical structures in medical reports for VLP. Code: https://github.com/cheliu-computation/IMITATE-TMI2024.

摘要

在医学视觉语言预训练（VLP）中，大量工作聚焦于从临床报告和医学图像中提取文本和图像特征。然而，现有方法可能忽略了临床报告中自然层次结构的潜力，临床报告通常分为用于描述的“发现”和用于结论的“印象”。当前的VLP方法倾向于将这些报告过度简化为单个实体或碎片化的令牌，而忽略了这种结构化格式。在这项工作中，我们提出了一种名为IMITATE的新型临床先验引导VLP框架，以通过层次化视觉语言对齐从医学报告中学习结构信息。该框架从胸部X光（CXR）图像中导出多级视觉特征，并将这些特征分别与分层医学报告中编码的描述性文本和结论性文本对齐。此外，还引入了一种新的临床信息对比损失用于跨模态学习，该损失在对比学习中制定样本相关性时考虑了临床先验知识。所提出的模型IMITATE在跨越五个医学成像下游任务的六个不同数据集上优于基线VLP方法。实验结果表明在医学报告中使用层次结构进行VLP的好处。代码：https://github.com/cheliu-computation/IMITATE-TMI2024

相似文献

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training.模仿：临床先验引导的分层视觉语言预训练

IEEE Trans Med Imaging. 2025 Jan;44(1):519-529. doi: 10.1109/TMI.2024.3449690. Epub 2025 Jan 2.

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。

Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.

Short-Term Memory Impairment短期记忆障碍

Tailoring task arithmetic to address bias in models trained on multi-institutional datasets.调整任务算法以解决在多机构数据集上训练的模型中的偏差问题。

J Biomed Inform. 2025 Aug;168:104858. doi: 10.1016/j.jbi.2025.104858. Epub 2025 Jun 8.

VIIDA and InViDe: computational approaches for generating and evaluating inclusive image paragraphs for the visually impaired.VIIDA和InViDe：为视障人士生成和评估包容性图像段落的计算方法。

Disabil Rehabil Assist Technol. 2025 Jul;20(5):1470-1495. doi: 10.1080/17483107.2024.2437567. Epub 2024 Dec 11.

Unlocking the Potential of Weakly Labeled Data: A Co-Evolutionary Learning Framework for Abnormality Detection and Report Generation.释放弱标签数据的潜力：一种用于异常检测和报告生成的协同进化学习框架

IEEE Trans Med Imaging. 2025 Apr;44(4):1671-1685. doi: 10.1109/TMI.2024.3516954. Epub 2025 Apr 3.

Generalizable diagnosis of chest radiographs through attention-guided decomposition of images utilizing self-consistency loss.利用自一致性损失引导图像分解进行可推广的胸片诊断。

Comput Biol Med. 2024 Sep;180:108922. doi: 10.1016/j.compbiomed.2024.108922. Epub 2024 Jul 31.

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Extracting adverse drug events from clinical Notes: A systematic review of approaches used.从临床记录中提取药物不良事件：对所用方法的系统评价

J Biomed Inform. 2024 Mar;151:104603. doi: 10.1016/j.jbi.2024.104603. Epub 2024 Feb 6.

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training.ECAMP：以实体为中心的上下文感知医学视觉语言预训练

Med Image Anal. 2025 Jun 26;105:103690. doi: 10.1016/j.media.2025.103690.

引用本文的文献

Vision-language foundation models for medical imaging: a review of current practices and innovations.用于医学成像的视觉语言基础模型：当前实践与创新综述

Biomed Eng Lett. 2025 Jun 6;15(5):809-830. doi: 10.1007/s13534-025-00484-6. eCollection 2025 Sep.

Concatenated CNN-Based Pneumonia Detection Using a Fuzzy-Enhanced Dataset.基于串联 CNN 的肺炎检测，使用模糊增强数据集。

Sensors (Basel). 2024 Oct 21;24(20):6750. doi: 10.3390/s24206750.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

模仿：临床先验引导的分层视觉语言预训练

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献