College of Medicine and Biological Information Engineering School, Northeastern University, Shenyang, China; CAS Key Laboratory of Molecular Imaging, Beijing Key Laboratory of Molecular Imaging, The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Department of Gastroenterology, Hepatology and Nutrition, Shanghai Children's Hospital, Shanghai Jiaotong University, Shanghai, China.
Gastrointest Endosc. 2022 Dec;96(6):929-942.e6. doi: 10.1016/j.gie.2022.07.019. Epub 2022 Jul 30.
The detection rate for early gastric cancer (EGC) is unsatisfactory, and mastering the diagnostic skills of magnifying endoscopy with narrow-band imaging (ME-NBI) requires rich expertise and experience. We aimed to develop an EGC captioning model (EGCCap) to automatically describe the visual characteristics of ME-NBI images for endoscopists.
ME-NBI images (n = 1886) from 294 cases were enrolled from multiple centers, and corresponding 5658 text data were designed following the simple EGC diagnostic algorithm. An EGCCap was developed using the multiscale meshed-memory transformer. We conducted comprehensive evaluations for EGCCap including the quantitative and quality of performance, generalization, robustness, interpretability, and assistant value analyses. The commonly used metrics were BLEUs, CIDEr, METEOR, ROUGE, SPICE, accuracy, sensitivity, and specificity. Two-sided statistical tests were conducted, and statistical significance was determined when P < .05.
EGCCap acquired satisfying captioning performance by outputting correctly and coherently clinically meaningful sentences in the internal test cohort (BLEU1 = 52.434, CIDEr = 36.734, METEOR = 27.823, ROUGE = 49.949, SPICE = 35.548) and maintained over 80% performance when applied to other centers or corrupted data. The diagnostic ability of endoscopists improved with the assistance of EGCCap, which was especially significant (P < .05) for junior endoscopists. Endoscopists gave EGCCap an average remarkable score of 7.182, showing acceptance of EGCCap.
EGCCap exhibited promising captioning performance and was proven with satisfying generalization, robustness, and interpretability. Our study showed potential value in aiding and improving the diagnosis of EGC and facilitating the development of automated reporting in the future.
早期胃癌(EGC)的检出率并不令人满意,掌握窄带成像放大内镜(ME-NBI)的诊断技能需要丰富的专业知识和经验。我们旨在开发一种 EGC 字幕模型(EGCCap),以便自动描述 ME-NBI 图像的视觉特征,为内镜医生提供帮助。
从多个中心招募了 294 例共 1886 例 ME-NBI 图像,并根据简单的 EGC 诊断算法设计了相应的 5658 个文本数据。采用多尺度网格记忆变换(multiscale meshed-memory transformer)开发了 EGCCap。我们对 EGCCap 进行了全面评估,包括性能的定量和质量、泛化能力、鲁棒性、可解释性和辅助价值分析。常用的指标包括 BLEUs、CIDEr、METEOR、ROUGE、SPICE、准确性、敏感性和特异性。进行了双侧统计检验,当 P<0.05 时认为具有统计学意义。
EGCCap 在内部测试队列中表现出令人满意的字幕生成性能,输出了正确且连贯的具有临床意义的句子(BLEU1=52.434、CIDEr=36.734、METEOR=27.823、ROUGE=49.949、SPICE=35.548),并且在应用于其他中心或损坏的数据时仍保持 80%以上的性能。EGCCap 辅助内镜医生提高了诊断能力,对于初级内镜医生尤其显著(P<0.05)。内镜医生对 EGCCap 的平均评分为 7.182,表明他们对 EGCCap 的接受程度较高。
EGCCap 表现出有前途的字幕生成性能,并证明具有令人满意的泛化能力、鲁棒性和可解释性。我们的研究表明,它在辅助和改善 EGC 的诊断以及促进未来自动报告的发展方面具有潜在价值。