Suppr超能文献

用于医学报告生成的视觉语言诊断语义增强

Visual-linguistic Diagnostic Semantic Enhancement for medical report generation.

作者信息

Chen Jiahong, Huang Guoheng, Yuan Xiaochen, Zhong Guo, Tan Zhe, Pun Chi-Man, Yang Qi

机构信息

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.

出版信息

J Biomed Inform. 2025 Jan;161:104764. doi: 10.1016/j.jbi.2024.104764. Epub 2024 Dec 31.

Abstract

Generative methods are currently popular for medical report generation, as they automatically generate professional reports from input images, assisting physicians in making faster and more accurate decisions. However, current methods face significant challenges: 1) Lesion areas in medical images are often difficult for models to capture accurately, and 2) even when captured, these areas are frequently not described using precise clinical diagnostic terms. To address these problems, we propose a Visual-Linguistic Diagnostic Semantic Enhancement model (VLDSE) to generate high-quality reports. Our approach employs supervised contrastive learning in the Image and Report Semantic Consistency (IRSC) module to bridge the semantic gap between visual and linguistic features. Additionally, we design the Visual Semantic Qualification and Quantification (VSQQ) module and the Post-hoc Semantic Correction (PSC) module to enhance visual semantics and inter-word relationships, respectively. Experiments demonstrate that our model achieves promising performance on the publicly available IU X-RAY and MIMIC-MV datasets. Specifically, on the IU X-RAY dataset, our model achieves a BLEU-4 score of 18.6%, improving the baseline by 12.7%. On the MIMIC-MV dataset, our model improves the BLEU-1 score by 10.7% over the baseline. These results demonstrate the ability of our model to generate accurate and fluent descriptions of lesion areas.

摘要

生成式方法目前在医学报告生成中很流行,因为它们能根据输入图像自动生成专业报告,帮助医生做出更快、更准确的决策。然而,当前的方法面临重大挑战:1)医学图像中的病变区域通常难以被模型准确捕捉,2)即使被捕捉到,这些区域也常常没有使用精确的临床诊断术语来描述。为了解决这些问题,我们提出了一种视觉语言诊断语义增强模型(VLDSE)来生成高质量报告。我们的方法在图像与报告语义一致性(IRSC)模块中采用监督对比学习,以弥合视觉和语言特征之间的语义差距。此外,我们设计了视觉语义限定与量化(VSQQ)模块和事后语义校正(PSC)模块,分别增强视觉语义和词间关系。实验表明,我们的模型在公开可用的IU X-RAY和MIMIC-MV数据集上取得了良好的性能。具体而言,在IU X-RAY数据集上,我们的模型实现了18.6%的BLEU-4分数,比基线提高了12.7%。在MIMIC-MV数据集上,我们的模型比基线的BLEU-1分数提高了10.7%。这些结果证明了我们的模型能够生成对病变区域准确且流畅的描述。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验