Suppr超能文献

两个欧洲睡眠中心的手动评分与基于人工智能的斯坦福-STAGES 算法的自动评分之间的睡眠分期评分者间可靠性。

Interrater sleep stage scoring reliability between manual scoring from two European sleep centers and automatic scoring performed by the artificial intelligence-based Stanford-STAGES algorithm.

机构信息

Department of Neurology, Medical University of Innsbruck, Innsbruck, Austria.

Interdisciplinary Sleep Medicine Center, Charité-Universitätsmedizin Berlin, Berlin, Germany.

出版信息

J Clin Sleep Med. 2021 Jun 1;17(6):1237-1247. doi: 10.5664/jcsm.9174.

Abstract

STUDY OBJECTIVES

The objective of this study was to evaluate interrater reliability between manual sleep stage scoring performed in 2 European sleep centers and automatic sleep stage scoring performed by the previously validated artificial intelligence-based Stanford-STAGES algorithm.

METHODS

Full night polysomnographies of 1,066 participants were included. Sleep stages were manually scored in Berlin and Innsbruck sleep centers and automatically scored with the Stanford-STAGES algorithm. For each participant, we compared (1) Innsbruck to Berlin scorings (INN vs BER); (2) Innsbruck to automatic scorings (INN vs AUTO); (3) Berlin to automatic scorings (BER vs AUTO); (4) epochs where scorers from Innsbruck and Berlin had consensus to automatic scoring (CONS vs AUTO); and (5) both Innsbruck and Berlin manual scorings (MAN) to the automatic ones (MAN vs AUTO). Interrater reliability was evaluated with several measures, including overall and sleep stage-specific Cohen's κ.

RESULTS

Overall agreement across participants was substantial for INN vs BER (κ = 0.66 ± 0.13), INN vs AUTO (κ = 0.68 ± 0.14), CONS vs AUTO (κ = 0.73 ± 0.14), and MAN vs AUTO (κ = 0.61 ± 0.14), and moderate for BER vs AUTO (κ = 0.55 ± 0.15). Human scorers had the highest disagreement for N1 sleep (κ = 0.40 ± 0.16 for INN vs BER). Automatic scoring had lowest agreement with manual scorings for N1 and N3 sleep (κ = 0.25 ± 0.14 and κ = 0.42 ± 0.32 for MAN vs AUTO).

CONCLUSIONS

Interrater reliability for sleep stage scoring between human scorers was in line with previous findings, and the algorithm achieved an overall substantial agreement with manual scoring. In this cohort, the Stanford-STAGES algorithm showed similar performances to the ones achieved in the original study, suggesting that it is generalizable to new cohorts. Before its integration in clinical practice, future independent studies should further evaluate it in other cohorts.

摘要

研究目的

本研究旨在评估在两个欧洲睡眠中心进行的手动睡眠分期评分与之前经过验证的基于人工智能的斯坦福-STAGES 算法的自动睡眠分期评分之间的评分者间可靠性。

方法

共纳入 1066 名参与者的整夜多导睡眠图。在柏林和因斯布鲁克睡眠中心进行手动睡眠分期评分,并使用斯坦福-STAGES 算法进行自动评分。对于每个参与者,我们比较了(1)因斯布鲁克与柏林评分(INN 与 BER);(2)因斯布鲁克与自动评分(INN 与 AUTO);(3)柏林与自动评分(BER 与 AUTO);(4)因斯布鲁克和柏林评分者达成共识的自动评分(CONS 与 AUTO);以及(5)因斯布鲁克和柏林的手动评分(MAN)与自动评分(MAN 与 AUTO)。采用多种措施评估评分者间可靠性,包括总体和睡眠分期特异性 Cohen's κ。

结果

参与者之间的整体一致性对于 INN 与 BER(κ=0.66±0.13)、INN 与 AUTO(κ=0.68±0.14)、CONS 与 AUTO(κ=0.73±0.14)和 MAN 与 AUTO(κ=0.61±0.14)均为高度一致,对于 BER 与 AUTO(κ=0.55±0.15)则为中度一致。人类评分者对 N1 睡眠的分歧最大(INN 与 BER 的 κ=0.40±0.16)。自动评分与手动评分的一致性最低,N1 和 N3 睡眠的 κ 值分别为 0.25±0.14 和 0.42±0.32(MAN 与 AUTO)。

结论

人类评分者之间的睡眠分期评分的评分者间可靠性与之前的研究结果一致,该算法与手动评分总体上具有高度一致性。在本队列中,斯坦福-STAGES 算法的表现与原始研究中的表现相似,表明它可以推广到新的队列。在将其整合到临床实践之前,未来的独立研究应在其他队列中进一步评估它。

相似文献

6
Process and outcome for international reliability in sleep scoring.睡眠评分国际可靠性的过程与结果
Sleep Breath. 2015 Mar;19(1):191-5. doi: 10.1007/s11325-014-0990-0. Epub 2014 May 7.

引用本文的文献

2
Sleep as the Foundation of Brain Health.睡眠是大脑健康的基础。
Semin Neurol. 2025 May;45(3):305-316. doi: 10.1055/a-2566-4073. Epub 2025 Mar 26.

本文引用的文献

2
Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning.通过深度迁移学习实现更准确的自动睡眠分期。
IEEE Trans Biomed Eng. 2021 Jun;68(6):1787-1798. doi: 10.1109/TBME.2020.3020381. Epub 2021 May 21.
4
Reinventing polysomnography in the age of precision medicine.精准医学时代的睡眠监测技术革新。
Sleep Med Rev. 2020 Aug;52:101313. doi: 10.1016/j.smrv.2020.101313. Epub 2020 Mar 20.
6
Automated sleep scoring: A review of the latest approaches.自动睡眠评分:最新方法综述。
Sleep Med Rev. 2019 Dec;48:101204. doi: 10.1016/j.smrv.2019.07.007. Epub 2019 Aug 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验