Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Germany.
Topic Group Dental Diagnostics and Digital Dentistry, WHO Focus Group AI on Health, Berlin, Germany.
J Dent. 2024 Nov;150:105318. doi: 10.1016/j.jdent.2024.105318. Epub 2024 Aug 27.
To improve reporting and comparability as well as to reduce bias in dental computer vision studies, we aimed to develop a Core Outcome Measures Set (COMS) for this field. The COMS was derived consensus based as part of the WHO/ITU/WIPO Global Initiative AI for Health (WHO/ITU/WIPO AI4H).
We first assessed existing guidance documents of diagnostic accuracy studies and conducted interviews with experts in the field. The resulting list of outcome measures was mapped against computer vision modeling tasks, clinical fields and reporting levels. The resulting systematization focused on providing relevant outcome measures whilst retaining details for meta-research and technical replication, displaying recommendations towards (1) levels of reporting for different clinical fields and tasks, and (2) outcome measures. The COMS was consented using a 2-staged e-Delphi, with 26 participants from various IADR groups, the WHO/ITU/WIPO AI4H, ADEA and AAOMFR.
We assigned agreed levels of reporting to different computer vision tasks. We agreed that human expert assessment and diagnostic accuracy considerations are the only feasible method to achieve clinically meaningful evaluation levels. Studies should at least report on eight core outcome measures: confusion matrix, accuracy, sensitivity, specificity, precision, F-1 score, area-under-the-receiver-operating-characteristic-curve, and area-under-the-precision-recall-curve.
Dental researchers should aim to report computer vision studies along the outlined COMS. Reviewers and editors may consider the defined COMS when assessing studies, and authors are recommended to justify when not employing the COMS.
Comparing and synthesizing dental computer vision studies is hampered by the variety of reported outcome measures. Adherence to the defined COMS is expected to increase comparability across studies, enable synthesis, and reduce selective reporting.
为了提高报告的质量和可比性,减少牙科计算机视觉研究中的偏倚,我们旨在为此领域制定一个核心结局测量集(COMS)。该 COMS 是基于世界卫生组织/国际电信联盟/世界知识产权组织全球人工智能健康倡议(WHO/ITU/WIPO AI4H)的共识制定的。
我们首先评估了现有的诊断准确性研究指南,并对该领域的专家进行了访谈。由此产生的结局测量列表与计算机视觉建模任务、临床领域和报告水平相对应。由此产生的系统化重点是提供相关的结局测量,同时保留元研究和技术复制的详细信息,展示了对(1)不同临床领域和任务的报告水平,以及(2)结局测量的建议。该 COMS 通过两阶段的电子 Delphi 达成共识,有来自 IADR 各组织、WHO/ITU/WIPO AI4H、ADEA 和 AAOMFR 的 26 名参与者参与。
我们为不同的计算机视觉任务分配了达成一致的报告水平。我们一致认为,人类专家评估和诊断准确性考虑是实现临床有意义的评估水平的唯一可行方法。研究至少应报告八项核心结局测量:混淆矩阵、准确性、敏感度、特异性、精度、F1 评分、受试者工作特征曲线下面积和精度-召回曲线下面积。
牙科研究人员应按照概述的 COMS 报告计算机视觉研究。评审员和编辑在评估研究时可以考虑定义的 COMS,并且建议作者在不使用 COMS 时说明原因。
报告的结局测量的多样性阻碍了牙科计算机视觉研究的比较和综合。遵守定义的 COMS 有望提高研究之间的可比性,实现综合,并减少选择性报告。