Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.
Department of Neurology, Sleep Disorders Division, Vanderbilt University School of Medicine, Nashville, TN.
Sleep. 2019 Oct 21;42(11). doi: 10.1093/sleep/zsz159.
Polysomnography (PSG) scoring is labor intensive and suffers from variability in inter- and intra-rater reliability. Automated PSG scoring has the potential to reduce the human labor costs and the variability inherent to this task. Deep learning is a form of machine learning that uses neural networks to recognize data patterns by inspecting many examples rather than by following explicit programming.
A sleep staging classifier trained using deep learning methods scored PSG data from the Sleep Heart Health Study (SHHS). The training set was composed of 42 560 hours of PSG data from 5213 patients. To capture higher-order data, spectrograms were generated from electroencephalography, electrooculography, and electromyography data and then passed to the neural network. A holdout set of 580 PSGs not included in the training set was used to assess model accuracy and discrimination via weighted F1-score, per-stage accuracy, and Cohen's kappa (K).
The optimal neural network model was composed of spectrograms in the input layer feeding into convolutional neural network layers and a long short-term memory layer to achieve a weighted F1-score of 0.87 and K = 0.82.
The deep learning sleep stage classifier demonstrates excellent accuracy and agreement with expert sleep stage scoring, outperforming human agreement on sleep staging. It achieves comparable or better F1-scores, accuracy, and Cohen's kappa compared to literature for automated sleep stage scoring of PSG epochs. Accurate automated scoring of other PSG events may eventually allow for fully automated PSG scoring.
多导睡眠图(PSG)评分需要大量的人工,并且存在评分者间和评分者内可靠性的可变性。自动 PSG 评分有可能降低人工成本,并减少该任务固有的可变性。深度学习是一种机器学习形式,它使用神经网络通过检查大量示例而不是遵循明确的编程来识别数据模式。
使用深度学习方法训练的睡眠分期分类器对睡眠心脏健康研究(SHHS)的 PSG 数据进行评分。训练集由来自 5213 名患者的 42560 小时 PSG 数据组成。为了捕获更高阶的数据,从脑电图、眼电图和肌电图数据生成声谱图,然后将其传递给神经网络。使用未包含在训练集中的 580 个 PSG 作为保留集,通过加权 F1 评分、每个阶段的准确性和 Cohen's kappa(K)来评估模型的准确性和区分能力。
最优神经网络模型由输入层中的声谱图组成,声谱图输入到卷积神经网络层和长短期记忆层,实现加权 F1 得分为 0.87,K = 0.82。
深度学习睡眠分期分类器表现出与专家睡眠分期评分非常高的准确性和一致性,优于人工睡眠分期。与文献中 PSG 时段的自动睡眠分期评分相比,它实现了可比或更高的 F1 评分、准确性和 Cohen's kappa。其他 PSG 事件的准确自动评分最终可能允许完全自动的 PSG 评分。