Liu Aohan, Guo Yuchen, Yong Jun-Hai, Xu Feng
IEEE Trans Med Imaging. 2024 Jul;43(7):2657-2669. doi: 10.1109/TMI.2024.3372638. Epub 2024 Jul 1.
The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.
准确的放射学报告自动生成具有重要的临床意义,并且已经引起了越来越多的研究兴趣。然而,由于正常和异常描述之间的不平衡以及放射学报告的多句和多主题性质,这仍然是一项具有挑战性的任务。这些特征给医学图像的准确描述带来了重大挑战,尤其是重要的异常发现。以往解决这些问题的方法严重依赖额外的人工标注,而获取这些标注成本高昂。我们提出了一个多粒度报告生成框架,该框架结合了句子级的图像-句子对比学习,它不需要任何额外的标注,而是能有效地从图像-报告对中学习知识。我们首先将对比学习引入作为图像特征学习的辅助任务。与以往的对比方法不同,我们利用成像报告的多主题性质,通过提取句子主题和内容,并在句子主题引导下对比句子内容和细化后的图像内容,进行细粒度的对比学习。这迫使模型为每个特定主题学习不同的异常图像特征。在生成过程中,我们使用两个解码器,首先生成粗略的句子主题,然后生成每个句子的细粒度文本。我们使用通过对比目标学习到的句子主题直接监督中间主题。这加强了生成约束,并使我们能够使用强化学习对解码器进行独立的微调,从而进一步提高模型性能。在两个大规模数据集MIMIC-CXR和IU-Xray上的实验表明,我们的方法在语言生成指标和临床准确性方面均优于现有的最先进方法。