Peng Peixi, Fan Wanshu, Shen Yue, Liu Wenfei, Yang Xin, Zhang Qiang, Wei Xiaopeng, Zhou Dongsheng
IEEE J Biomed Health Inform. 2024 Dec;28(12):7406-7419. doi: 10.1109/JBHI.2024.3422168. Epub 2024 Dec 5.
The potential benefits of automatic radiology report generation, such as reducing misdiagnosis rates and enhancing clinical diagnosis efficiency, are significant. However, existing data-driven methods lack essential medical prior knowledge, which hampers their performance. Moreover, establishing global correspondences between radiology images and related reports, while achieving local alignments between images correlated with prior knowledge and text, remains a challenging task. To address these shortcomings, we introduce a novel Eye Gaze Guided Cross-modal Alignment Network (EGGCA-Net) for generating accurate medical reports. Our approach incorporates prior knowledge from radiologists' Eye Gaze Region (EGR) to refine the fidelity and comprehensibility of report generation. Specifically, we design a Dual Fine-Grained Branch (DFGB) and a Multi-Task Branch (MTB) to collaboratively ensure the alignment of visual and textual semantics across multiple levels. To establish fine-grained alignment between EGR-related images and sentences, we introduce the Sentence Fine-grained Prototype Module (SFPM) within DFGB to capture cross-modal information at different levels. Additionally, to learn the alignment of EGR-related image topics, we introduce the Multi-task Feature Fusion Module (MFFM) within MTB to refine the encoder output information. Finally, a specifically designed label matching mechanism is designed to generate reports that are consistent with the anticipated disease states. The experimental outcomes indicate that the introduced methodology surpasses previous advanced techniques, yielding enhanced performance on two extensively used benchmark datasets: Open-i and MIMIC-CXR.
自动生成放射学报告具有显著的潜在益处,例如降低误诊率和提高临床诊断效率。然而,现有的数据驱动方法缺乏必要的医学先验知识,这阻碍了它们的性能。此外,在放射学图像与相关报告之间建立全局对应关系,同时在与先验知识相关的图像和文本之间实现局部对齐,仍然是一项具有挑战性的任务。为了解决这些缺点,我们引入了一种新颖的眼动引导跨模态对齐网络(EGGCA-Net)来生成准确的医学报告。我们的方法纳入了放射科医生眼动区域(EGR)的先验知识,以提高报告生成的保真度和可理解性。具体而言,我们设计了一个双细粒度分支(DFGB)和一个多任务分支(MTB),以协同确保跨多个层次的视觉和文本语义对齐。为了在与EGR相关的图像和句子之间建立细粒度对齐,我们在DFGB中引入了句子细粒度原型模块(SFPM),以在不同层次捕获跨模态信息。此外,为了学习与EGR相关的图像主题的对齐,我们在MTB中引入了多任务特征融合模块(MFFM),以细化编码器输出信息。最后,设计了一种专门的标签匹配机制,以生成与预期疾病状态一致的报告。实验结果表明,所提出的方法优于先前的先进技术,在两个广泛使用的基准数据集Open-i和MIMIC-CXR上取得了更好的性能。