Gao Danyang, Kong Ming, Zhao Yongrui, Huang Jing, Huang Zhengxing, Kuang Kun, Wu Fei, Zhu Qiang
Computer School, Beijing Information Science and Technology University, Beijing 100005, China.
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China.
Med Image Anal. 2024 Jan;91:102982. doi: 10.1016/j.media.2023.102982. Epub 2023 Sep 29.
Medical report generation can be treated as a process of doctors' observing, understanding, and describing images from different perspectives. Following this process, this paper innovatively proposes a Transformer-based Semantic Query learning paradigm (TranSQ). Briefly, this paradigm is to learn an intention embedding set and make a semantic query to the visual features, generate intent-compliant sentence candidates, and form a coherent report. We apply a bipartite matching mechanism during training to realize the dynamic correspondence between the intention embeddings and the sentences to induct medical concepts into the observation intentions. Experimental results on two major radiology reporting datasets (i.e., IU X-ray and MIMIC-CXR) demonstrate that our model outperforms state-of-the-art models regarding generation effectiveness and clinical efficacy. In addition, comprehensive ablation experiments fully validate the TranSQ model's innovation and interpretation. The code is available at https://github.com/zjukongming/TranSQ.
医学报告生成可以被视为医生从不同角度观察、理解和描述图像的过程。遵循这一过程,本文创新性地提出了一种基于Transformer的语义查询学习范式(TranSQ)。简而言之,该范式旨在学习一个意图嵌入集,并对视觉特征进行语义查询,生成符合意图的句子候选,并形成连贯的报告。我们在训练过程中应用二分匹配机制,以实现意图嵌入与句子之间的动态对应,从而将医学概念引入观察意图。在两个主要的放射学报告数据集(即IU X射线和MIMIC-CXR)上的实验结果表明,我们的模型在生成效果和临床疗效方面优于现有模型。此外,全面的消融实验充分验证了TranSQ模型的创新性和可解释性。代码可在https://github.com/zjukongming/TranSQ获取。