Department of Ophthalmology, Byers Eye Institute, Stanford School of Medicine, 2452 Watson Court, Palo Alto, CA, 94303, USA.
Retina Service, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA, 02114, USA.
Sci Rep. 2021 Mar 8;11(1):5369. doi: 10.1038/s41598-021-84723-7.
To describe a database of longitudinally graded telemedicine retinal images to be used as a comparator for future studies assessing grader recall bias and ability to detect typical progression (e.g. International Classification of Retinopathy of Prematurity (ICROP) stages) as well as incremental changes in retinopathy of prematurity (ROP). Cohort comprised of retinal images from 84 eyes of 42 patients who were sequentially screened for ROP over 6 consecutive weeks in a telemedicine program and then followed to vascular maturation or treatment, and then disease stabilization. De-identified retinal images across the 6 weekly exams (2520 total images) were graded by an ROP expert based on whether ROP had improved, worsened, or stayed the same compared to the prior week's images, corresponding to an overall clinical "gestalt" score. Subsequently, we examined which parameters might have influenced the examiner's ability to detect longitudinal change; images were graded by the same ROP expert by image view (central, inferior, nasal, superior, temporal) and by retinal components (vascular tortuosity, vascular dilation, stage, hemorrhage, vessel growth), again determining if each particular retinal component or ROP in each image view had improved, worsened, or stayed the same compared to the prior week's images. Agreement between gestalt scores and view, component, and component by view scores was assessed using percent agreement, absolute agreement, and Cohen's weighted kappa statistic to determine if any of the hypothesized image features correlated with the ability to predict ROP disease trajectory in patients. The central view showed substantial agreement with gestalt scores (κ = 0.63), with moderate agreement in the remaining views. Of retinal components, vascular tortuosity showed the most overall agreement with gestalt (κ = 0.42-0.61), with only slight to fair agreement for all other components. This is a well-defined ROP database graded by one expert in a real-world setting in a masked fashion that correlated with the actual (remote in time) exams and known outcomes. This provides a foundation for subsequent study of telemedicine's ability to longitudinally assess ROP disease trajectory, as well as for potential artificial intelligence approaches to retinal image grading, in order to expand patient access to timely, accurate ROP screening.
描述一个纵向分级远程医疗视网膜图像数据库,用于未来的研究评估分级员的回忆偏差和检测典型进展(例如国际早产儿视网膜病变分类(ICROP)分期)以及早产儿视网膜病变(ROP)的增量变化的能力。该队列包括在远程医疗计划中连续 6 周对 42 名患者的 84 只眼进行 ROP 筛查的视网膜图像,然后进行血管成熟或治疗,然后疾病稳定。在 6 次每周检查中(总共 2520 张图像),由一名 ROP 专家对所有视网膜图像进行分级,根据与前一周图像相比,ROP 是改善、恶化还是保持不变,对应于整体临床“整体”评分。随后,我们检查了哪些参数可能影响检查者检测纵向变化的能力;通过图像视图(中央、下方、鼻侧、上方、颞侧)和视网膜成分(血管迂曲、血管扩张、分期、出血、血管生长),由同一名 ROP 专家对图像进行分级,再次确定每个特定的视网膜成分或每个图像视图中的 ROP 是否与前一周的图像相比有所改善、恶化或保持不变。使用百分比一致性、绝对一致性和 Cohen 加权 kappa 统计来评估整体评分与视图、成分和视图成分评分之间的一致性,以确定任何假设的图像特征是否与预测患者 ROP 疾病轨迹的能力相关。中央视图与整体评分具有显著一致性(κ=0.63),其余视图具有中度一致性。在视网膜成分中,血管迂曲与整体评分具有最一致的相关性(κ=0.42-0.61),而所有其他成分的一致性仅为轻微至中等。这是一个在现实环境中由一位专家以掩蔽方式对 ROP 进行分级的明确界定的数据库,与实际(时间上较远)的检查和已知结果相关。这为后续研究远程医疗评估 ROP 疾病轨迹的能力以及视网膜图像分级的潜在人工智能方法奠定了基础,以便扩大患者获得及时、准确的 ROP 筛查的机会。