Riotto Eleonora, Tsai Wei-Shan, Khalid Hagar, Lamanna Francesca, Roch Louise, Manoj Medha, Sivaprasad Sobha
Moorfields Eye Hospital NHS Foundation Trust, 162 City Road, London EC1V 2PD, UK.
Hampshire Hospitals NHS Foundation Trust, Aldermaston Road, Basingstoke RG24 9NA, UK.
Diagnostics (Basel). 2025 Jul 21;15(14):1831. doi: 10.3390/diagnostics15141831.
: Discrepancies in diabetic retinopathy (DR) grading are well-documented, with retinal non-perfusion (RNP) quantification posing greater challenges. This study assessed intergrader agreement in DR evaluation, focusing on qualitative severity grading and quantitative RNP measurement. We aimed to improve agreement through structured consensus meetings. : A retrospective analysis of 100 comparisons from 50 eyes (36 patients) was conducted. Two paired medical retina fellows graded ultra-widefield color fundus photographs (CFP) and fundus fluorescein angiography (FFA) images. CFP assessments included DR severity using the International Clinical Diabetic Retinopathy (ICDR) grading system, DR Severity Scale (DRSS), and predominantly peripheral lesions (PPL). FFA-based RNP was defined as capillary loss with grayscale matching the foveal avascular zone. Weekly adjudication by a senior specialist resolved discrepancies. Intergrader agreement was evaluated using Cohen's kappa (qualitative DRSS) and intraclass correlation coefficients (ICC) (quantitative RNP). Bland-Altman analysis assessed bias and variability. : After eight consensus meetings, CFP grading agreement improved to excellent: kappa = 91% (ICDR DR severity), 89% (DRSS), and 89% (PPL). FFA-based PPL agreement reached 100%. For RNP, the non-perfusion index (NPI) showed moderate overall ICC (0.49), with regional ICCs ranging from 0.40 to 0.57 (highest in the nasal region, ICC = 0.57). Bland-Altman analysis revealed a mean NPI difference of 0.12 (limits: -0.11 to 0.35), indicating acceptable variability despite outliers. : Structured consensus training achieved excellent intergrader agreement for DR severity and PPL grading, supporting the clinical reliability of ultra-widefield imaging. However, RNP measurement variability underscores the need for standardized protocols and automated tools to enhance reproducibility. This process is critical for developing robust AI-based screening systems.
糖尿病视网膜病变(DR)分级中的差异已有充分记录,视网膜无灌注(RNP)量化带来了更大挑战。本研究评估了DR评估中分级者间的一致性,重点关注定性严重程度分级和定量RNP测量。我们旨在通过结构化共识会议提高一致性。
对来自50只眼睛(36名患者)的100次比较进行了回顾性分析。两名配对的医学视网膜专科住院医师对超广角彩色眼底照片(CFP)和眼底荧光血管造影(FFA)图像进行分级。CFP评估包括使用国际临床糖尿病视网膜病变(ICDR)分级系统、糖尿病视网膜病变严重程度量表(DRSS)以及主要周边病变(PPL)来评估DR严重程度。基于FFA的RNP被定义为灰度与黄斑无血管区匹配的毛细血管缺失。由一位资深专家每周进行裁决以解决差异。使用Cohen's kappa(定性DRSS)和组内相关系数(ICC)(定量RNP)评估分级者间的一致性。Bland - Altman分析评估偏差和变异性。
经过八次共识会议后,CFP分级一致性提高到优秀:kappa = 91%(ICDR DR严重程度)、89%(DRSS)和89%(PPL)。基于FFA的PPL一致性达到100%。对于RNP,无灌注指数(NPI)显示出中等的总体ICC(0.49),区域ICC范围为0.40至0.57(在鼻侧区域最高,ICC = 0.57)。Bland - Altman分析显示平均NPI差异为0.12(范围:-0.11至0.35),表明尽管存在异常值,但变异性可接受。
结构化共识培训在DR严重程度和PPL分级方面实现了优秀的分级者间一致性,支持了超广角成像的临床可靠性。然而,RNP测量的变异性突出了需要标准化方案和自动化工具以提高可重复性。这一过程对于开发强大的基于人工智能的筛查系统至关重要。