Cortegoso Valdivia Pablo, Deding Ulrik, Bjørsum-Meyer Thomas, Baatrup Gunnar, Fernández-Urién Ignacio, Dray Xavier, Boal-Carvalho Pedro, Ellul Pierre, Toth Ervin, Rondonotti Emanuele, Kaalby Lasse, Pennazio Marco, Koulaouzidis Anastasios
Gastroenterology and Endoscopy Unit, University Hospital of Parma, University of Parma, 43126 Parma, Italy.
Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark.
Diagnostics (Basel). 2022 Oct 2;12(10):2400. doi: 10.3390/diagnostics12102400.
Video-capsule endoscopy (VCE) reading is a time- and energy-consuming task. Agreement on findings between readers (either different or the same) is a crucial point for increasing performance and providing valid reports. The aim of this systematic review with meta-analysis is to provide an evaluation of inter/intra-observer agreement in VCE reading. A systematic literature search in PubMed, Embase and Web of Science was performed throughout September 2022. The degree of observer agreement, expressed with different test statistics, was extracted. As different statistics are not directly comparable, our analyses were stratified by type of test statistics, dividing them in groups of "None/Poor/Minimal", "Moderate/Weak/Fair", "Good/Excellent/Strong" and "Perfect/Almost perfect" to report the proportions of each. In total, 60 studies were included in the analysis, with a total of 579 comparisons. The quality of included studies, assessed with the MINORS score, was sufficient in 52/60 studies. The most common test statistics were the Kappa statistics for categorical outcomes (424 comparisons) and the intra-class correlation coefficient (ICC) for continuous outcomes (73 comparisons). In the overall comparison of inter-observer agreement, only 23% were evaluated as "good" or "perfect"; for intra-observer agreement, this was the case in 36%. Sources of heterogeneity (high, I 81.8-98.1%) were investigated with meta-regressions, showing a possible role of country, capsule type and year of publication in Kappa inter-observer agreement. VCE reading suffers from substantial heterogeneity and sub-optimal agreement in both inter- and intra-observer evaluation. Artificial-intelligence-based tools and the adoption of a unified terminology may progressively enhance levels of agreement in VCE reading.
视频胶囊内镜(VCE)阅片是一项耗时费力的任务。阅片者之间(无论是不同阅片者还是同一阅片者)对检查结果达成一致是提高效能和提供有效报告的关键。这项带有荟萃分析的系统评价旨在评估VCE阅片中观察者间/观察者内的一致性。于2022年9月在PubMed、Embase和Web of Science数据库中进行了系统的文献检索。提取了用不同检验统计量表示的观察者一致性程度。由于不同的统计量不具有直接可比性,我们的分析按检验统计量类型进行分层,将其分为“无/差/极小”“中等/弱/尚可”“好/优/强”和“完美/几乎完美”几组,以报告每组的比例。分析共纳入60项研究,总计579次比较。用MINORS评分评估纳入研究的质量,60项研究中有52项质量足够。最常用的检验统计量是用于分类结果的Kappa统计量(424次比较)和用于连续结果的组内相关系数(ICC,73次比较)。在观察者间一致性的总体比较中,只有23%被评估为“好”或“完美”;观察者内一致性的情况为36%。通过Meta回归研究了异质性来源(高,I² 81.8 - 98.1%),结果显示国家、胶囊类型和发表年份在Kappa观察者间一致性方面可能起作用。VCE阅片在观察者间和观察者内评估中均存在显著的异质性和次优的一致性。基于人工智能的工具和采用统一术语可能会逐步提高VCE阅片的一致性水平。