Chylinski Daphne, Berthomier Christian, Lambot Eric, Frenette Sonia, Brandewinder Marie, Carrier Julie, Vandewalle Gilles, Muto Vincenzo
GIGA-Cyclotron Research Centre-In Vivo Imaging (CRC-IVI), University of Liège, Liège, Belgium.
PHYSIP, Paris, France.
J Sleep Res. 2022 Feb;31(1):e13424. doi: 10.1111/jsr.13424. Epub 2021 Jun 24.
Sleep stage scoring can lead to important inter-expert variability. Although likely, whether this issue is amplified in older populations, which show alterations of sleep electrophysiology, has not been thoroughly assessed. Algorithms for automatic sleep stage scoring may appear ideal to eliminate inter-expert variability. Yet, variability between human experts and algorithm sleep stage scoring in healthy older individuals has not been investigated. Here, we aimed to compare stage scoring of older individuals and hypothesized that variability, whether between experts or considering the algorithm, would be higher than usually reported in the literature. Twenty cognitively normal and healthy late midlife individuals' (61 ± 5 years; 10 women) night-time sleep recordings were scored by two experts from different research centres and one algorithm. We computed agreements for the entire night (percentage and Cohen's κ) and each sleep stage. Whole-night pairwise agreements were relatively low and ranged from 67% to 78% (κ, 0.54-0.67). Sensitivity across pairs of scorers proved lowest for stages N1 (8.2%-63.4%) and N3 (44.8%-99.3%). Significant differences between experts and/or algorithm were found for total sleep time, sleep efficiency, time spent in N1/N2/N3 and wake after sleep onset (p ≤ 0.005), but not for sleep onset latency, rapid eye movement (REM) and slow-wave sleep (SWS) duration (N2 + N3). Our results confirm high inter-expert variability in healthy aging. Consensus appears good for REM and SWS, considered as a whole. It seems more difficult for N3, potentially because human raters adapt their interpretation according to overall changes in sleep characteristics. Although the algorithm does not substantially reduce variability, it would favour time-efficient standardization.
睡眠阶段评分可能导致专家之间出现显著差异。尽管有可能,但在睡眠电生理发生改变的老年人群中,这个问题是否会被放大,尚未得到充分评估。自动睡眠阶段评分算法似乎是消除专家间差异的理想选择。然而,健康老年人中人类专家与算法睡眠阶段评分之间的差异尚未得到研究。在此,我们旨在比较老年人的阶段评分,并假设无论是专家之间还是考虑算法,差异都将高于文献中通常报道的水平。来自不同研究中心的两位专家和一种算法对20名认知正常且健康的中年晚期个体(61±5岁;10名女性)的夜间睡眠记录进行了评分。我们计算了整个晚上的一致性(百分比和科恩κ系数)以及每个睡眠阶段的一致性。整晚的两两一致性相对较低,范围在67%至78%之间(κ系数,0.54 - 0.67)。对于N1期(8.2% - 63.4%)和N3期(44.8% - 99.3%),评分者之间的敏感性最低。在总睡眠时间、睡眠效率、N1/N2/N3期所花费的时间以及睡眠开始后的觉醒时间方面,专家和/或算法之间存在显著差异(p≤0.005),但在睡眠开始潜伏期、快速眼动(REM)和慢波睡眠(SWS)持续时间(N2 + N3)方面没有差异。我们的结果证实了健康老年人中专家间存在高度差异。总体而言,对于REM和SWS,一致性似乎较好。对于N3期,达成共识似乎更困难,这可能是因为人工评分者会根据睡眠特征的总体变化来调整他们的解释。尽管算法并没有大幅降低差异,但它有利于提高时间效率的标准化。