Department of Medicine and Optometry, Linnaeus University, Kalmar, Sweden.
Low Vision and Visual Rehabilitation Lab, Department and Center of Physics-Optometry and Vision Science, University of Minho Braga, Braga, Portugal.
PLoS One. 2019 Jun 7;14(6):e0216775. doi: 10.1371/journal.pone.0216775. eCollection 2019.
First, to evaluate inter-rater reliability when human raters estimate the reading performance of visually impaired individuals using the MNREAD acuity chart. Second, to evaluate the agreement between computer-based scoring algorithms and compare them with human rating.
Reading performance was measured for 101 individuals with low vision, using the Portuguese version of the MNREAD test. Seven raters estimated the maximum reading speed (MRS) and critical print size (CPS) of each individual MNREAD curve. MRS and CPS were also calculated automatically for each curve using two different algorithms: the original standard deviation method (SDev) and a non-linear mixed effects (NLME) modeling. Intra-class correlation coefficients (ICC) were used to estimate absolute agreement between raters and/or algorithms.
Absolute agreement between raters was 'excellent' for MRS (ICC = 0.97; 95%CI [0.96, 0.98]) and 'moderate' to 'good' for CPS (ICC = 0.77; 95%CI [0.69, 0.83]). For CPS, inter-rater reliability was poorer among less experienced raters (ICC = 0.70; 95%CI [0.57, 0.80]) when compared to experienced ones (ICC = 0.82; 95%CI [0.76, 0.88]). Absolute agreement between the two algorithms was 'excellent' for MRS (ICC = 0.96; 95%CI [0.91, 0.98]). For CPS, the best possible agreement was found for CPS defined as the print size sustaining 80% of MRS (ICC = 0.77; 95%CI [0.68, 0.84]). Absolute agreement between raters and automated methods was 'excellent' for MRS (ICC = 0.96; 95% CI [0.88, 0.98] for SDev; ICC = 0.97; 95% CI [0.95, 0.98] for NLME). For CPS, absolute agreement between raters and SDev ranged from 'poor' to 'good' (ICC = 0.66; 95% CI [0.3, 0.80]), while agreement between raters and NLME was 'good' (ICC = 0.83; 95% CI [0.76, 0.88]).
For MRS, inter-rater reliability is excellent, even considering the possibility of noisy and/or incomplete data collected in low-vision individuals. For CPS, inter-rater reliability is lower. This may be problematic, for instance in the context of multisite investigations or follow-up examinations. The NLME method showed better agreement with the raters than the SDev method for both reading parameters. Setting up consensual guidelines to deal with ambiguous curves may help improve reliability. While the exact definition of CPS should be chosen on a case-by-case basis depending on the clinician or researcher's motivations, evidence suggests that estimating CPS as the smallest print size sustaining about 80% of MRS would increase inter-rater reliability.
首先,评估视力障碍者使用 MNREAD 视力表时,人类评估者估计阅读表现的评分者间信度。其次,评估基于计算机的评分算法之间的一致性,并与人工评分进行比较。
使用葡萄牙语版 MNREAD 测试,对 101 名低视力个体的阅读表现进行测量。7 名评估者估计了每个个体 MNREAD 曲线的最大阅读速度(MRS)和临界印刷尺寸(CPS)。还使用两种不同的算法(原始标准偏差法(SDev)和非线性混合效应(NLME)建模)自动计算每个曲线的 MRS 和 CPS。使用组内相关系数(ICC)来评估评估者和/或算法之间的绝对一致性。
评估者之间的 MRS 绝对一致性非常好(ICC = 0.97;95%CI [0.96,0.98]),CPS 的一致性为中度到良好(ICC = 0.77;95%CI [0.69,0.83])。对于 CPS,经验较少的评估者(ICC = 0.70;95%CI [0.57,0.80])的评分者间可靠性比经验丰富的评估者(ICC = 0.82;95%CI [0.76,0.88])差。两种算法之间的 MRS 绝对一致性非常好(ICC = 0.96;95%CI [0.91,0.98])。对于 CPS,找到的最佳一致性是将 MRS 维持 80%的印刷尺寸定义为 CPS(ICC = 0.77;95%CI [0.68,0.84])。评估者与自动化方法之间的 MRS 绝对一致性非常好(ICC = 0.96;95%CI [0.88,0.98],SDev;ICC = 0.97;95%CI [0.95,0.98],NLME)。对于 CPS,评估者与 SDev 之间的绝对一致性从较差到良好(ICC = 0.66;95%CI [0.3,0.80]),而评估者与 NLME 的一致性为良好(ICC = 0.83;95%CI [0.76,0.88])。
对于 MRS,即使考虑到低视力个体中可能存在嘈杂和/或不完整的数据,评分者间可靠性也非常好。对于 CPS,评分者间可靠性较低。这可能会造成问题,例如在多地点研究或随访检查的情况下。对于两种阅读参数,NLME 方法与评估者的一致性均优于 SDev 方法。制定共识指南来处理模糊曲线可能有助于提高可靠性。虽然 CPS 的精确定义应根据临床医生或研究人员的动机在个案基础上进行选择,但有证据表明,将 CPS 估计为维持约 80%MRS 的最小印刷尺寸将提高评分者间可靠性。