Guillot Antoine, Sauvet Fabien, During Emmanuel H, Thorey Valentin
IEEE Trans Neural Syst Rehabil Eng. 2020 Sep;28(9):1955-1965. doi: 10.1109/TNSRE.2020.3011181. Epub 2020 Jul 22.
Sleep stage classification constitutes an important element of sleep disorder diagnosis. It relies on the visual inspection of polysomnography records by trained sleep technologists. Automated approaches have been designed to alleviate this resource-intensive task. However, such approaches are usually compared to a single human scorer annotation despite an inter-rater agreement of about 85% only. The present study introduces two publicly-available datasets, DOD-H including 25 healthy volunteers and DOD-O including 55 patients suffering from obstructive sleep apnea (OSA). Both datasets have been scored by 5 sleep technologists from different sleep centers. We developed a framework to compare automated approaches to a consensus of multiple human scorers. Using this framework, we benchmarked and compared the main literature approaches to a new deep learning method, SimpleSleepNet, which reach state-of-the-art performances while being more lightweight. We demonstrated that many methods can reach human-level performance on both datasets. SimpleSleepNet achieved an F1 of 89.9% vs 86.8% on average for human scorers on DOD-H, and an F1 of 88.3% vs 84.8% on DOD-O. Our study highlights that state-of-the-art automated sleep staging outperforms human scorers performance for healthy volunteers and patients suffering from OSA. Considerations could be made to use automated approaches in the clinical setting.
睡眠阶段分类是睡眠障碍诊断的一个重要组成部分。它依赖于训练有素的睡眠技术人员对多导睡眠图记录进行目视检查。已经设计了自动化方法来减轻这项资源密集型任务。然而,尽管评分者间的一致性仅约为85%,但此类方法通常仅与单个人类评分者的注释进行比较。本研究引入了两个公开可用的数据集,包括25名健康志愿者的DOD-H和包括55名阻塞性睡眠呼吸暂停(OSA)患者的DOD-O。这两个数据集均由来自不同睡眠中心的5名睡眠技术人员进行评分。我们开发了一个框架,将自动化方法与多个人类评分者的共识进行比较。使用这个框架,我们将主要文献方法与一种新的深度学习方法SimpleSleepNet进行了基准测试和比较,该方法在更轻量级的同时达到了当前的最佳性能。我们证明,许多方法在这两个数据集上都能达到人类水平的性能。在DOD-H数据集上,SimpleSleepNet的F1值为89.9%,而人类评分者的平均F1值为86.8%;在DOD-O数据集上,SimpleSleepNet的F1值为88.3%,而人类评分者的平均F1值为84.8%。我们的研究强调,对于健康志愿者和OSA患者,当前的自动化睡眠分期方法优于人类评分者的表现。可以考虑在临床环境中使用自动化方法。