Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China.
Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive Incision, Renmin Hospital of Wuhan University, Wuhan, China.
J Gastroenterol Hepatol. 2024 Aug;39(8):1623-1635. doi: 10.1111/jgh.16615. Epub 2024 May 14.
False positives (FPs) pose a significant challenge in the application of artificial intelligence (AI) for polyp detection during colonoscopy. The study aimed to quantitatively evaluate the impact of computer-aided polyp detection (CADe) systems' FPs on endoscopists.
The model's FPs were categorized into four gradients: 0-5, 5-10, 10-15, and 15-20 FPs per minute (FPPM). Fifty-six colonoscopy videos were collected for a crossover study involving 10 endoscopists. Polyp missed rate (PMR) was set as primary outcome. Subsequently, to further verify the impact of FPPM on the assistance capability of AI in clinical environments, a secondary analysis was conducted on a prospective randomized controlled trial (RCT) from Renmin Hospital of Wuhan University in China from July 1 to October 15, 2020, with the adenoma detection rate (ADR) as primary outcome.
Compared with routine group, CADe reduced PMR when FPPM was less than 5. However, with the continuous increase of FPPM, the beneficial effect of CADe gradually weakens. For secondary analysis of RCT, a total of 956 patients were enrolled. In AI-assisted group, ADR is higher when FPPM ≤ 5 compared with FPPM > 5 (CADe group: 27.78% vs 11.90%; P = 0.014; odds ratio [OR], 0.351; 95% confidence interval [CI], 0.152-0.812; COMBO group: 38.40% vs 23.46%, P = 0.029; OR, 0.427; 95% CI, 0.199-0.916). After AI intervention, ADR increased when FPPM ≤ 5 (27.78% vs 14.76%; P = 0.001; OR, 0.399; 95% CI, 0.231-0.690), but no statistically significant difference was found when FPPM > 5 (11.90% vs 14.76%, P = 0.788; OR, 1.111; 95% CI, 0.514-2.403).
The level of FPs of CADe does affect its effectiveness as an aid to endoscopists, with its best effect when FPPM is less than 5.
人工智能(AI)在结肠镜检查中用于息肉检测时,假阳性(FP)是一个重大挑战。本研究旨在定量评估计算机辅助息肉检测(CADe)系统的 FP 对内镜医师的影响。
将模型的 FP 分为四级:每分钟 0-5、5-10、10-15 和 15-20 个 FP(FPPM)。为了进行交叉研究,共收集了 56 个结肠镜检查视频,涉及 10 名内镜医师。息肉漏诊率(PMR)为主要结局。随后,为了进一步验证 FP 对 AI 在临床环境中的辅助能力的影响,对中国武汉大学人民医院 2020 年 7 月 1 日至 10 月 15 日的前瞻性随机对照试验(RCT)进行了二次分析,主要结局为腺瘤检出率(ADR)。
与常规组相比,当 FPPM 小于 5 时,CADe 可降低 PMR。然而,随着 FPPM 的不断增加,CADe 的有益效果逐渐减弱。对于 RCT 的二次分析,共纳入 956 例患者。在 AI 辅助组中,当 FPPM≤5 时,ADR 高于 FPPM>5(CADe 组:27.78%比 11.90%;P=0.014;比值比[OR],0.351;95%置信区间[CI],0.152-0.812;COMBO 组:38.40%比 23.46%;P=0.029;OR,0.427;95% CI,0.199-0.916)。AI 干预后,当 FPPM≤5 时,ADR 增加(27.78%比 14.76%;P=0.001;OR,0.399;95% CI,0.231-0.690),但 FPPM>5 时无统计学差异(11.90%比 14.76%;P=0.788;OR,1.111;95% CI,0.514-2.403)。
CADe 的 FP 水平确实会影响其作为内镜医师辅助工具的有效性,当 FPPM 小于 5 时效果最佳。