College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China.
College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China.
J Biomed Inform. 2024 Sep;157:104718. doi: 10.1016/j.jbi.2024.104718. Epub 2024 Aug 28.
Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.
放射学报告生成可自动从医学影像数据中合成诊断叙述。当前的报告生成方法主要使用知识图谱来增强图像,但忽略了知识图谱本身的可解释性和指导作用。此外,很少有方法利用多模态预训练模型的稳定模态对齐信息来促进放射学报告的生成。我们提出了术语引导放射学报告生成(TGR),这是一种简单而实用的模型,主要通过解剖学术语来指导报告生成。具体来说,我们使用了一个双流视觉特征提取模块,包括细节提取模块和冻结的多模态预训练模型,分别提取视觉细节特征和语义特征。此外,我们提出了一个视觉增强模块(VEM)来进一步丰富视觉特征,从而促进解剖学术语列表的生成。我们将解剖学术语与图像特征相结合,并进一步利用冻结的文本嵌入进行对比学习,利用这些嵌入的稳定特征空间进一步增强模态对齐能力。我们的模型具有手动输入的能力,能够生成特定异常区域的器官列表,或根据选择的解剖学术语生成更准确的单句描述。全面的实验表明了我们的方法在报告生成任务中的有效性,我们的 TGR-S 模型在参数减少 38.9%的同时与当前最先进的模型性能相当,而我们的 TGR-B 模型在多个指标上都超过了最佳基线模型。