Zhu Deng, Liu Lijun, Yang Xiaobing, Liu Li, Peng Wei
Cologe of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, P. R. China.
Yunnan Key Laboratory of Computer Technology Application, Kunming, 650500, P. R. China.
J Imaging Inform Med. 2025 Jan 31. doi: 10.1007/s10278-025-01422-9.
Chest radiology report generation plays a vital role in supporting diagnosis, alleviating physician workload, and reducing the risk of misdiagnosis. However, significant challenges persist: (1) Data bias and background noise in chest images often obscure subtle lesion details, leading models to generate similar reports; (2) the distinct modal spaces of radiology images and reports weaken the semantic correlation between detailed visual lesion features and report sentences; (3) generated reports often lack crucial patient background and disease extent details, impeding report quality and accuracy. To address these challenges, this paper proposes a novel approach for generating chest radiology reports, utilizing denoising multi-level cross-attention and multi-level contrastive. The proposed method first involves sequential encoding of frontal and lateral radiology images into a visual extractor to enhance semantic coherence across image patches and improve visual feature representation. The enhanced visual features are processed through denoising multi-level cross-attention, which effectively suppresses noise and highlights the details of subtle lesions. Secondly, a multi-level contrastive learning module leverages contrastive among images, text, and disease labels to distinguish positive samples from negative ones, thereby strengthening the semantic correlation between detailed visual lesion features and report sentences. Finally, relevant knowledge is incorporated into the report generator to enhance the description of patient lesion details. Comparative experiments were performed against other state-of-the-art methods on the IU-Xray and MIMIC-CXR datasets, demonstrating that the proposed method significantly improves model performance. Additionally, ablation studies confirm that each module contributes to enhancing the quality of generated reports.
胸部放射学报告生成在辅助诊断、减轻医生工作量以及降低误诊风险方面发挥着至关重要的作用。然而,重大挑战依然存在:(1)胸部图像中的数据偏差和背景噪声常常模糊细微病变细节,导致模型生成相似的报告;(2)放射学图像和报告的不同模态空间削弱了详细视觉病变特征与报告语句之间的语义相关性;(3)生成的报告常常缺乏关键的患者背景和疾病范围细节,影响报告质量和准确性。为应对这些挑战,本文提出一种生成胸部放射学报告的新颖方法,利用去噪多级交叉注意力和多级对比。所提出的方法首先将前后位和侧位放射学图像依次编码到视觉提取器中,以增强跨图像块的语义连贯性并改善视觉特征表示。增强后的视觉特征通过去噪多级交叉注意力进行处理,有效抑制噪声并突出细微病变的细节。其次,一个多级对比学习模块利用图像、文本和疾病标签之间的对比来区分正样本和负样本,从而加强详细视觉病变特征与报告语句之间的语义相关性。最后,将相关知识纳入报告生成器以增强对患者病变细节的描述。在IU-Xray和MIMIC-CXR数据集上针对其他先进方法进行了对比实验,结果表明所提出的方法显著提高了模型性能。此外,消融研究证实每个模块都有助于提高生成报告的质量。