Casey Eye Institute, Department of Ophthalmology, Oregon Health & Science University, Portland, Oregon.
National Eye Institute, National Institutes of Health, Bethesda, Maryland.
Ophthalmology. 2022 Jul;129(7):e69-e76. doi: 10.1016/j.ophtha.2022.02.008. Epub 2022 Feb 12.
To validate a vascular severity score as an appropriate output for artificial intelligence (AI) Software as a Medical Device (SaMD) for retinopathy of prematurity (ROP) through comparison with ordinal disease severity labels for stage and plus disease assigned by the International Classification of Retinopathy of Prematurity, Third Edition (ICROP3), committee.
Validation study of an AI-based ROP vascular severity score.
A total of 34 ROP experts from the ICROP3 committee.
Two separate datasets of 30 fundus photographs each for stage (0-5) and plus disease (plus, preplus, neither) were labeled by members of the ICROP3 committee using an open-source platform. Averaging these results produced a continuous label for plus (1-9) and stage (1-3) for each image. Experts were also asked to compare each image to each other in terms of relative severity for plus disease. Each image was also labeled with a vascular severity score from the Imaging and Informatics in ROP deep learning system, which was compared with each grader's diagnostic labels for correlation, as well as the ophthalmoscopic diagnosis of stage.
Weighted kappa and Pearson correlation coefficients (CCs) were calculated between each pair of grader classification labels for stage and plus disease. The Elo algorithm was also used to convert pairwise comparisons for each expert into an ordered set of images from least to most severe.
The mean weighted kappa and CC for all interobserver pairs for plus disease image comparison were 0.67 and 0.88, respectively. The vascular severity score was found to be highly correlated with both the average plus disease classification (CC = 0.90, P < 0.001) and the ophthalmoscopic diagnosis of stage (P < 0.001 by analysis of variance) among all experts.
The ROP vascular severity score correlates well with the International Classification of Retinopathy of Prematurity committee member's labels for plus disease and stage, which had significant intergrader variability. Generation of a consensus for a validated scoring system for ROP SaMD can facilitate global innovation and regulatory authorization of these technologies.
通过与国际早产儿视网膜病变分类(ICROP3)委员会分配的分期和附加疾病的ordinal 疾病严重程度标签进行比较,验证血管严重程度评分作为一种用于早产儿视网膜病变(ROP)的人工智能(AI)软件作为医疗器械(SaMD)的合适输出。
基于 AI 的 ROP 血管严重程度评分的验证研究。
来自 ICROP3 委员会的 34 名 ROP 专家。
使用开源平台,由 ICROP3 委员会成员对 30 张眼底照片的两个独立数据集(0-5 期和附加疾病)进行标记。对这些结果进行平均处理,为每张图像产生附加疾病(附加、preplus、无)的连续标签(1-9)和分期(1-3)。专家还被要求根据附加疾病的相对严重程度来比较每张图像。每张图像还被标记了来自 ROP 深度学习系统的血管严重程度评分,该评分与每位分级员的诊断标签进行了相关性比较,以及与分期的眼底镜诊断进行了比较。
计算每个分期和附加疾病分级员分类标签之间的加权 kappa 和 Pearson 相关系数(CC)。Elo 算法也被用于将每位专家的成对比较转换为从最不严重到最严重的有序图像集。
所有观察者之间比较附加疾病图像的平均加权 kappa 和 CC 分别为 0.67 和 0.88。血管严重程度评分与平均附加疾病分类高度相关(CC = 0.90,P < 0.001),与所有专家的分期眼底镜诊断高度相关(方差分析 P < 0.001)。
ROP 血管严重程度评分与国际早产儿视网膜病变分类委员会成员对附加疾病和分期的标签相关性良好,且具有显著的分级员间变异性。为 ROP SaMD 制定一个经过验证的评分系统的共识可以促进这些技术的全球创新和监管授权。