Philips Sleep and Respiratory Care, Pittsburgh, PA,USA.
Philips Sleep and Respiratory Care, Vienna, Austria.
Sleep. 2023 Feb 8;46(2). doi: 10.1093/sleep/zsac154.
To quantify the amount of sleep stage ambiguity across expert scorers and to validate a new auto-scoring platform against sleep staging performed by multiple scorers.
We applied a new auto-scoring system to three datasets containing 95 PSGs scored by 6-12 scorers, to compare sleep stage probabilities (hypnodensity; i.e. the probability of each sleep stage being assigned to a given epoch) as the primary output, as well as a single sleep stage per epoch assigned by hierarchical majority rule.
The percentage of epochs with 100% agreement across scorers was 46 ± 9%, 38 ± 10% and 32 ± 9% for the datasets with 6, 9, and 12 scorers, respectively. The mean intra-class correlation coefficient between sleep stage probabilities from auto- and manual-scoring was 0.91, representing excellent reliability. Within each dataset, agreement between auto-scoring and consensus manual-scoring was significantly higher than agreement between manual-scoring and consensus manual-scoring (0.78 vs. 0.69; 0.74 vs. 0.67; and 0.75 vs. 0.67; all p < 0.01).
Analysis of scoring performed by multiple scorers reveals that sleep stage ambiguity is the rule rather than the exception. Probabilities of the sleep stages determined by artificial intelligence auto-scoring provide an excellent estimate of this ambiguity. Compared to consensus manual-scoring, sleep staging derived from auto-scoring is for each individual PSG noninferior to manual-scoring meaning that auto-scoring output is ready for interpretation without the need for manual adjustment.
量化专家评分者之间的睡眠分期歧义程度,并验证一个新的自动评分平台与多位评分者进行的睡眠分期的准确性。
我们应用一个新的自动评分系统对包含 6-12 位评分者的三个数据集的 95 次 PSG 进行评估,以比较睡眠分期概率(即每个睡眠分期被分配给特定时段的概率)作为主要输出,以及每个时段通过分层多数规则分配的单一睡眠分期。
在有 6、9 和 12 位评分者的数据集上,评分者之间 100%一致的时段百分比分别为 46 ± 9%、38 ± 10%和 32 ± 9%。自动和手动评分的睡眠分期概率之间的平均组内相关系数为 0.91,代表极好的可靠性。在每个数据集内,自动评分与共识手动评分之间的一致性显著高于手动评分与共识手动评分之间的一致性(0.78 比 0.69;0.74 比 0.67;0.75 比 0.67;均 p < 0.01)。
对多位评分者的评分分析表明,睡眠分期的歧义是常态而非异常。人工智能自动评分确定的睡眠分期概率提供了对这种歧义的极好估计。与共识手动评分相比,自动评分得出的睡眠分期对于每个单独的 PSG 来说并不逊于手动评分,这意味着自动评分结果无需手动调整即可进行解释。