Digestive Endoscopy Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy.
Centre for Endoscopic Research Therapeutics and Training (CERTT), Università Cattolica del Sacro Cuore, Rome, Italy.
United European Gastroenterol J. 2022 Oct;10(8):817-826. doi: 10.1002/ueg2.12285. Epub 2022 Aug 19.
Widespread adoption of optical diagnosis of colorectal neoplasia is prevented by suboptimal endoscopist performance and lack of standardized training and competence evaluation. We aimed to assess diagnostic accuracy of endoscopists in optical diagnosis of colorectal neoplasia in the framework of artificial intelligence (AI) validation studies. Literature searches of databases (PubMed/MEDLINE, EMBASE, Scopus) up to April 2022 were performed to identify articles evaluating accuracy of individual endoscopists in performing optical diagnosis of colorectal neoplasia within studies validating AI against a histologically verified ground-truth. The main outcomes were endoscopists' pooled sensitivity, specificity, positive and negative predictive value (PPV/NPV), positive and negative likelihood ratio (LR) and area under the curve (AUC for sROC) for predicting adenomas versus non-adenomas. Six studies with 67 endoscopists and 2085 (IQR: 115-243,5) patients were evaluated. Pooled sensitivity and specificity for adenomatous histology was respectively 84.5% (95% CI 80.3%-88%) and 83% (95% CI 79.6%-85.9%), corresponding to a PPV, NPV, LR+, LR- of 89.5% (95% CI 87.1%-91.5%), 75.7% (95% CI 70.1%-80.7%), 5 (95% CI 3.9%-6.2%) and 0.19 (95% CI 0.14%-0.25%). The AUC was 0.82 (CI 0.76-0.90). Expert endoscopists showed a higher sensitivity than non-experts (90.5%, [95% CI 87.6%-92.7%] vs. 75.5%, [95% CI 66.5%-82.7%], p < 0.001), and Eastern endoscopists showed a higher sensitivity than Western (85%, [95% CI 80.5%-88.6%] vs. 75.8%, [95% CI 70.2%-80.6%]). Quality was graded high for 3 studies and low for 3 studies. We show that human accuracy for diagnosis of colorectal neoplasia in the setting of AI studies is suboptimal. Educational interventions could benefit by AI validation settings which seem a feasible framework for competence assessment.
光学诊断结直肠肿瘤的广泛采用受到内镜医师表现不佳以及缺乏标准化培训和能力评估的阻碍。我们旨在评估人工智能(AI)验证研究中内镜医师在光学诊断结直肠肿瘤方面的诊断准确性。截至 2022 年 4 月,对数据库(PubMed/MEDLINE、EMBASE、Scopus)进行了文献检索,以确定评估个别内镜医师在对 AI 进行验证研究中进行光学诊断结直肠肿瘤的准确性的文章,这些研究将 AI 与经组织学验证的真实情况进行了比较。主要结局是预测腺瘤与非腺瘤时,内镜医师的汇总敏感性、特异性、阳性和阴性预测值(PPV/NPV)、阳性和阴性似然比(LR)以及曲线下面积(sROC 的 AUC)。评估了 6 项研究中的 67 名内镜医师和 2085 名(IQR:115-243,5)患者。腺瘤组织学的汇总敏感性和特异性分别为 84.5%(95%CI 80.3%-88%)和 83%(95%CI 79.6%-85.9%),对应的 PPV、NPV、LR+、LR-分别为 89.5%(95%CI 87.1%-91.5%)、75.7%(95%CI 70.1%-80.7%)、5(95%CI 3.9%-6.2%)和 0.19(95%CI 0.14%-0.25%)。AUC 为 0.82(CI 0.76-0.90)。专家内镜医师的敏感性高于非专家(90.5%,[95%CI 87.6%-92.7%] vs. 75.5%,[95%CI 66.5%-82.7%],p < 0.001),东方内镜医师的敏感性高于西方(85%,[95%CI 80.5%-88.6%] vs. 75.8%,[95%CI 70.2%-80.6%])。3 项研究质量评为高,3 项研究质量评为低。我们表明,在 AI 研究中,人类对结直肠肿瘤的诊断准确性并不理想。教育干预可能受益于 AI 验证设置,该设置似乎是一种可行的能力评估框架。