Captos Co., Ltd., Yangsan, Korea.
Department of Applied Statistics, School of Social Science, Gachon University, Seongnam, Korea.
Medicine (Baltimore). 2023 Feb 10;102(6):e32883. doi: 10.1097/MD.0000000000032883.
Studies comparing the detection of clean mucosal areas in capsule endoscopy (CE) using human judgment versus artificial intelligence (AI) are rare. This study statistically analyzed gastroenterologist judgments and AI results. Three hundred CE video clips (100 patients) were prepared. Five gastroenterologists classified the video clips into 3 groups (≥75% [high], 50%-75% [middle], and < 50% [low]) according to their subjective judgment of cleanliness. Visualization scores were calculated using an AI algorithm based on the predicted visible area, and the 5 gastroenterologists' judgments and AI results were compared. The 5 gastroenterologists evaluated CE clip video quality as "high" in 10.7% to 36.7% and as "low" in 28.7% to 60.3% and 29.7% of cases, respectively. The AI evaluated CE clip video quality as "high" in 27.7% and as "low" in 29.7% of cases. Repeated-measures analysis of variance (ANOVA) revealed significant differences in the 6 evaluation indicators (5 gastroenterologists and 1 AI) (P < .001). Among the 300 judgments, 90 (30%) were consistent with 5 gastroenterologists' judgments, and 82 (91.1%) agreed with the AI judgments. The "high" and "low" judgments of the gastroenterologists and AI agreed in 95.0% and 94.9% of cases, respectively. Bonferroni's multiple comparison test showed no significant difference between 3 gastroenterologists and AI (P = .0961, P = 1.0000, and P = .0676, respectively) but a significant difference between the other 2 with AI (P < .0001). When evaluating CE images for cleanliness, the judgments of 5 gastroenterologists were relatively diverse. The AI produced a relatively universal judgment that was consistent with the gastroenterologists' judgements.
将胶囊内镜(CE)中清洁黏膜区域的检测结果与人工判断和人工智能(AI)进行比较的研究较为少见。本研究对胃肠病学家的判断和 AI 结果进行了统计学分析。准备了 300 个 CE 视频片段(100 例患者)。5 位胃肠病学家根据主观清洁程度将视频片段分为 3 组(≥75%[高]、50%-75%[中]和<50%[低])。使用基于预测可见区域的 AI 算法计算可视化评分,并比较 5 位胃肠病学家的判断和 AI 结果。5 位胃肠病学家分别评估 CE 夹视频质量为“高”(10.7%36.7%)和“低”(28.7%60.3%和 29.7%)。AI 评估 CE 夹视频质量为“高”(27.7%)和“低”(29.7%)。重复测量方差分析(ANOVA)显示,6 项评估指标(5 位胃肠病学家和 1 位 AI)之间存在显著差异(P<0.001)。在 300 次判断中,有 90 次(30%)与 5 位胃肠病学家的判断一致,有 82 次(91.1%)与 AI 判断一致。AI 与胃肠病学家在“高”和“低”判断方面的一致性分别为 95.0%和 94.9%。Bonferroni 多重比较检验显示,3 位胃肠病学家与 AI 之间无显著差异(P=0.0961、P=1.0000 和 P=0.0676),但另外 2 位与 AI 之间有显著差异(P<0.0001)。在评估 CE 图像的清洁度时,5 位胃肠病学家的判断相对多样化。AI 产生了相对普遍的判断,与胃肠病学家的判断一致。