Lo Bobby, Møller Bjørn, Igel Christian, Wildt Signe, Vind Ida, Bendtsen Flemming, Burisch Johan, Ibragimov Bulat
Gastro Unit, Medical Section, Copenhagen University Hospital-Amager and Hvidovre, Hvidovre, Denmark.
Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
Am J Gastroenterol. 2025 Feb 28. doi: 10.14309/ajg.0000000000003382.
Endoscopic classification of ulcerative colitis (UC) shows high interobserver variation. Previous research demonstrated that artificial intelligence (AI) can match the accuracy of central reading in scoring still images. We now extend this assessment to longer colon segments and integrate AI into clinical workflows, evaluating its use for real-time, video-based classification of disease severity, and as a support system for physicians.
We trained a convolutional neural network with the Mayo Endoscopic Subscores (MESs) of 2,561 images and 53 videos from 645 patients. The model differentiated scorable from unscorable endoscopy sections through open-set recognition. Validation involved 140 video clips from 44 patients with UC. Six inflammatory bowel disease (IBD) experts and 16 nonexperts rated these videos, with expert scores as the gold standard. We assessed the model's performance and the value as a supporting system. Last, the model underwent an alpha test on a real-world patient as a real-time endoscopic support.
The model achieved an accuracy of 82%, with no significant differences between the experts and the AI. When used as a supporting system, it improved non-IBD experts' performance by 12% and disagreed with the primary physician in 20%-39% of cases. During the alpha test, it was successfully integrated into clinical practice, accurately distinguishing between MES 0 and MES 1, consistent with endoscopists' assessments.
Our innovative AI model shows significant potential for enhancing the accuracy of UC severity classification and improving the proficiency of non-IBD experts. It is designed for clinical use and has proven feasible in real-world testing.
溃疡性结肠炎(UC)的内镜分类显示观察者间差异很大。先前的研究表明,人工智能(AI)在对静态图像进行评分时可以达到中央阅片的准确性。我们现在将这种评估扩展到更长的结肠段,并将AI整合到临床工作流程中,评估其在基于视频的疾病严重程度实时分类中的应用,以及作为医生的支持系统。
我们使用来自645例患者的2561张图像和53个视频的梅奥内镜亚评分(MES)训练了一个卷积神经网络。该模型通过开放集识别区分可评分和不可评分的内镜部分。验证涉及来自44例UC患者的140个视频片段。六位炎症性肠病(IBD)专家和16位非专家对这些视频进行评分,以专家评分作为金标准。我们评估了该模型的性能及其作为支持系统的价值。最后,该模型在一名真实患者身上进行了阿尔法测试,作为实时内镜支持。
该模型的准确率达到82%,专家和AI之间没有显著差异。当用作支持系统时,它将非IBD专家的表现提高了12%,并且在20%-39%的病例中与主治医生意见不一致。在阿尔法测试期间,它成功地整合到临床实践中,准确地区分了MES 0和MES 1,与内镜医生的评估一致。
我们创新的AI模型在提高UC严重程度分类的准确性和提高非IBD专家的熟练程度方面显示出巨大潜力。它是为临床使用而设计的,并且在实际测试中已被证明是可行的。