Zha Bowen, Cai Angshu, Wang Guiqi
Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
JMIR Med Inform. 2024 Jul 15;12:e56361. doi: 10.2196/56361.
Some research has already reported the diagnostic value of artificial intelligence (AI) in different endoscopy outcomes. However, the evidence is confusing and of varying quality.
This review aimed to comprehensively evaluate the credibility of the evidence of AI's diagnostic accuracy in endoscopy.
Before the study began, the protocol was registered on PROSPERO (CRD42023483073). First, 2 researchers searched PubMed, Web of Science, Embase, and Cochrane Library using comprehensive search terms. Then, researchers screened the articles and extracted information. We used A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2) to evaluate the quality of the articles. When there were multiple studies aiming at the same result, we chose the study with higher-quality evaluations for further analysis. To ensure the reliability of the conclusions, we recalculated each outcome. Finally, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) was used to evaluate the credibility of the outcomes.
A total of 21 studies were included for analysis. Through AMSTAR2, it was found that 8 research methodologies were of moderate quality, while other studies were regarded as having low or critically low quality. The sensitivity and specificity of 17 different outcomes were analyzed. There were 4 studies on esophagus, 4 studies on stomach, and 4 studies on colorectal regions. Two studies were associated with capsule endoscopy, two were related to laryngoscopy, and one was related to ultrasonic endoscopy. In terms of sensitivity, gastroesophageal reflux disease had the highest accuracy rate, reaching 97%, while the invasion depth of colon neoplasia, with 71%, had the lowest accuracy rate. On the other hand, the specificity of colorectal cancer was the highest, reaching 98%, while the gastrointestinal stromal tumor, with only 80%, had the lowest specificity. The GRADE evaluation suggested that the reliability of most outcomes was low or very low.
AI proved valuabe in endoscopic diagnoses, especially in esophageal and colorectal diseases. These findings provide a theoretical basis for developing and evaluating AI-assisted systems, which are aimed at assisting endoscopists in carrying out examinations, leading to improved patient health outcomes. However, further high-quality research is needed in the future to fully validate AI's effectiveness.
一些研究已经报道了人工智能(AI)在不同内镜检查结果中的诊断价值。然而,证据令人困惑且质量参差不齐。
本综述旨在全面评估人工智能在内镜检查中诊断准确性证据的可信度。
在研究开始前,该方案已在PROSPERO(CRD42023483073)上注册。首先,两名研究人员使用全面的检索词在PubMed、科学网、Embase和Cochrane图书馆进行检索。然后,研究人员筛选文章并提取信息。我们使用评估系统评价的测量工具2(AMSTAR2)来评估文章的质量。当有多项针对同一结果的研究时,我们选择评估质量较高的研究进行进一步分析。为确保结论的可靠性,我们重新计算了每个结果。最后,使用推荐分级、评估、制定和评价(GRADE)来评估结果的可信度。
共纳入21项研究进行分析。通过AMSTAR2发现,8种研究方法质量中等,而其他研究被认为质量低或极低。分析了17种不同结果的敏感性和特异性。有4项关于食管的研究,4项关于胃的研究,4项关于结肠区域的研究。两项研究与胶囊内镜检查有关,两项与喉镜检查有关,一项与超声内镜检查有关。在敏感性方面,胃食管反流病的准确率最高,达到97%,而结肠肿瘤浸润深度的准确率最低,为71%。另一方面,结直肠癌的特异性最高,达到98%,而胃肠道间质瘤的特异性最低,仅为80%。GRADE评估表明,大多数结果的可靠性低或非常低。
人工智能在内镜诊断中被证明是有价值的,尤其是在食管和结肠疾病中。这些发现为开发和评估旨在协助内镜医师进行检查从而改善患者健康结局的人工智能辅助系统提供了理论基础。然而,未来需要进一步的高质量研究来充分验证人工智能的有效性。