Department of Technical Physics, University of Eastern Finland, Kuopio, Finland.
Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland.
J Sleep Res. 2024 Aug;33(4):e14127. doi: 10.1111/jsr.14127. Epub 2023 Dec 26.
We investigated arousal scoring agreement within full-night polysomnography in a multi-centre setting. Ten expert scorers from seven centres annotated 50 polysomnograms using the American Academy of Sleep Medicine guidelines. The agreement between arousal indexes (ArIs) was investigated using intraclass correlation coefficients (ICCs). Moreover, kappa statistics were used to evaluate the second-by-second agreement in whole recordings and in different sleep stages. Finally, arousal clusters, that is, periods with overlapping arousals by multiple scorers, were extracted. The overall similarity of the ArIs was fair (ICC = 0.41), varying from poor to excellent between the scorer pairs (ICC = 0.04-0.88). The ArI similarity was better in respiratory (ICC = 0.65) compared with spontaneous (ICC = 0.23) arousals. The overall second-by-second agreement was fair (Fleiss' kappa = 0.40), varying from poor to substantial depending on the scorer pair (Cohen's kappa = 0.07-0.68). Fleiss' kappa increased from light to deep sleep (0.45, 0.45, and 0.53 for stages N1, N2, and N3, respectively), was moderate in the rapid eye movement stage (0.48), and the lowest in the wake stage (0.25). Over a half of the arousal clusters were scored by one or two scorers, and less than a third by at least five scorers. In conclusion, the scoring agreement varied depending on the arousal type, sleep stage, and scorer pair, but was overall relatively low. The most uncertain areas were related to spontaneous arousals and arousals scored in the wake stage. These results indicate that manual arousal scoring is generally not reliable, and that changes are needed in the assessment of sleep fragmentation for clinical and research purposes.
我们在多中心环境中研究了全夜多导睡眠图中的觉醒评分一致性。来自七个中心的十位专家评分员根据美国睡眠医学学会指南对 50 份多导睡眠图进行注释。使用组内相关系数 (ICC) 研究了觉醒指数 (ArI) 之间的一致性。此外,使用 Kappa 统计评估了整个记录和不同睡眠阶段中每秒的一致性。最后,提取了觉醒簇,即多个评分员重叠觉醒的时间段。ArI 的整体相似性为一般 (ICC=0.41),评分员之间的相似性从差到极好不等 (ICC=0.04-0.88)。与自发性觉醒相比,呼吸性觉醒的 ArI 相似性更好 (ICC=0.65)。整体每秒一致性为一般 (Fleiss' kappa=0.40),评分员之间的一致性从差到显著不等 (Cohen's kappa=0.07-0.68)。Fleiss' kappa 值从浅睡到深睡逐渐增加 (N1、N2 和 N3 阶段分别为 0.45、0.45 和 0.53),在快速眼动期为中度 (0.48),在觉醒期最低 (0.25)。超过一半的觉醒簇由一个或两个评分员评分,不到三分之一的由至少五个评分员评分。总之,评分一致性取决于觉醒类型、睡眠阶段和评分员对,总体上相对较低。最不确定的领域与自发性觉醒和在觉醒期评分的觉醒有关。这些结果表明,手动觉醒评分通常不可靠,在临床和研究目的中需要改变对睡眠碎片化的评估。