Ferri Raffaele, Bruni Oliviero, Miano Silvia, Smerieri Arianna, Spruyt Karen, Terzano Mario G
Department of Neurology I.C., Sleep Research Centre, Oasi Institute for Research on Mental Retardation and Brain Aging (IRCCS), Via C. Ruggero 73, 94018 Troina, Italy.
Clin Neurophysiol. 2005 Mar;116(3):696-707. doi: 10.1016/j.clinph.2004.09.021. Epub 2004 Nov 10.
To assess inter-rater reliability between different scorers, from different qualified sleep research groups, in scoring visually the Cyclic Alternating Pattern (CAP), to evaluate the performances of a new tool for the computer-assisted detection of CAP, and to compare its output with the data from the different scorers.
CAP was scored in 11 normal sleep recordings by four different raters, coming from three sleep laboratories. CAP was also scored in the same recordings by means of a new computer-assisted method, implemented in the Hypnolab 1.2 (SWS Soft, Italy) software. Data analysis was performed according to the following steps: (a) the inter-rater reliability of CAP parameters between the four different scorers was carried out by means of the Kendall W coefficient of concordance; (b) the analysis of the agreement between the results of the visual and computer-assisted analysis of CAP parameters was also carried out by means of the Kendall W coefficient; (c) a 'consensus' scoring was obtained, for each recording, from the four scorings provided by the different raters, based on the score of the majority of scorers; (d) the degree of agreement between each scorer and the consensus score and between the computer-assisted analysis and the consensus score was quantified by means of the Cohen's k coefficient; (e) the differences between the number of false positive and false negative detections obtained in the visual and in the computer-assisted analysis were also evaluated by means of the non-parametric Wilcoxon test.
The inter-rater reliability of CAP parameters quantified by the Kendall W coefficient of concordance between the four different scorers was high for all the parameters considered and showed values above 0.9 for total CAP time, CAP time in sleep stage 2 and percentage of A phases in sequence; also CAP rate showed a high value (0.829). The most important global parameters of CAP, including total CAP rate and CAP time, scored by the computer-assisted analysis showed a significant concordance with those obtained by the raters. The agreement between the computer-assisted analysis and the consensus scoring for the assignment of the CAP A phase subtype was not distinguishable from that expected from a human scorer. However, the computer-assisted analysis provided a number of false positives and false negatives significantly higher than that of the visual scoring of CAP.
CAP scoring shows good inter-rater reliability and might be compared in different laboratories the results of which might also be pooled together; however, caution should always be taken because of the variability which can be expected in the classical sleep staging. The computer-assisted detection of CAP can be used with some supervision and correction in large studies when only general parameters such as CAP rate are considered; more editing is necessary for the correct use of the other results.
This article describes the first attempt in the literature to evaluate in a detailed way the inter-rater reliability in scoring CAP parameters of normal sleep and the performances of a human-supervised computerized automatic detection system.
评估来自不同合格睡眠研究小组的不同评分者在视觉评分周期性交替模式(CAP)时的评分者间信度,评估一种用于计算机辅助检测CAP的新工具的性能,并将其输出结果与不同评分者的数据进行比较。
来自三个睡眠实验室的四名不同评分者对11份正常睡眠记录进行CAP评分。同时,通过Hypnolab 1.2(意大利SWS Soft公司)软件中实现的一种新的计算机辅助方法,对相同记录进行CAP评分。数据分析按以下步骤进行:(a)通过肯德尔W一致性系数评估四名不同评分者之间CAP参数的评分者间信度;(b)同样通过肯德尔W系数分析CAP参数的视觉分析结果与计算机辅助分析结果之间的一致性;(c)根据不同评分者提供的四个评分,基于多数评分者的分数,为每份记录获得一个“共识”评分;(d)通过科恩k系数量化每个评分者与共识评分之间以及计算机辅助分析与共识评分之间的一致程度;(e)还通过非参数威尔科克森检验评估视觉分析和计算机辅助分析中获得的假阳性和假阴性检测数量之间的差异。
对于所有考虑的参数,通过肯德尔W一致性系数量化的四名不同评分者之间CAP参数的评分者间信度较高,总CAP时间、睡眠2期的CAP时间和序列中A期百分比的值均高于0.9;CAP发生率也显示出较高值(0.829)。计算机辅助分析得出的CAP最重要的总体参数,包括总CAP发生率和CAP时间,与评分者得出的结果显示出显著一致性。计算机辅助分析与CAP A期亚型分配的共识评分之间的一致性与人类评分者预期的一致性没有区别。然而,计算机辅助分析提供的假阳性和假阴性数量明显高于CAP的视觉评分。
CAP评分显示出良好的评分者间信度,不同实验室的结果可以进行比较,其结果也可以汇总在一起;然而,由于经典睡眠分期中可能存在的变异性,应始终谨慎。在仅考虑CAP发生率等一般参数的大型研究中,计算机辅助检测CAP可以在一定监督和校正下使用;对于正确使用其他结果,还需要更多编辑。
本文描述了文献中首次详细评估正常睡眠CAP参数评分的评分者间信度以及人工监督的计算机自动检测系统性能的尝试。