Hong Jung Kyung, Lee Taeyoung, Delos Reyes Roben Deocampo, Hong Joonki, Tran Hai Hong, Lee Dongheon, Jung Jinhwan, Yoon In-Young
Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Korea.
Seoul National University College of Medicine, Seoul, Korea.
Nat Sci Sleep. 2021 Dec 24;13:2239-2250. doi: 10.2147/NSS.S333566. eCollection 2021.
Automated sleep stage scoring is not yet vigorously used in practice because of the black-box nature and the risk of wrong predictions. The objective of this study was to introduce a confidence-based framework to detect the possibly wrong predictions that would inform clinicians about which epochs would require a manual review and investigate the potential to improve accuracy for automated sleep stage scoring.
We used 702 polysomnography studies from a local clinical dataset (SNUBH dataset) and 2804 from an open dataset (SHHS dataset) for experiments. We adapted the state-of-the-art TinySleepNet architecture to train the classifier and modified the ConfidNet architecture to train an auxiliary confidence model. For the confidence model, we developed a novel method, Dropout Correct Rate (DCR), and the performance of it was compared with other existing methods.
Confidence estimates (0.754) reflected accuracy (0.758) well in general. The best performance for differentiating correct and wrong predictions was shown when using the DCR method (AUROC: 0.812) compared to the existing approaches which largely failed to detect wrong predictions. By reviewing only 20% of epochs that received the lowest confidence values, the overall accuracy of sleep stage scoring was improved from 76% to 87%. For patients with reduced accuracy (ie, individuals with obesity or severe sleep apnea), the possible improvement range after applying confidence estimation was even greater.
To the best of our knowledge, this is the first study applying confidence estimation on automated sleep stage scoring. Reliable confidence estimates by the DCR method help screen out most of the wrong predictions, which would increase the reliability and interpretability of automated sleep stage scoring.
由于自动睡眠阶段评分具有黑箱性质且存在错误预测的风险,目前在实践中尚未得到广泛应用。本研究的目的是引入一个基于置信度的框架,以检测可能的错误预测,从而告知临床医生哪些时段需要人工复查,并研究提高自动睡眠阶段评分准确性的潜力。
我们使用了来自本地临床数据集(SNUBH数据集)的702份多导睡眠图研究和来自开放数据集(SHHS数据集)的2804份研究进行实验。我们采用了最先进的TinySleepNet架构来训练分类器,并修改了ConfidNet架构来训练一个辅助置信度模型。对于置信度模型,我们开发了一种新方法,即随机失活正确率(DCR),并将其性能与其他现有方法进行了比较。
总体而言,置信度估计值(0.754)能较好地反映准确率(0.758)。与现有方法相比,使用DCR方法在区分正确和错误预测方面表现最佳(曲线下面积:0.812),现有方法大多无法检测到错误预测。通过仅复查置信度值最低的20%的时段,睡眠阶段评分的总体准确率从76%提高到了87%。对于准确率较低的患者(即肥胖或严重睡眠呼吸暂停患者),应用置信度估计后的可能改善幅度甚至更大。
据我们所知,这是第一项将置信度估计应用于自动睡眠阶段评分的研究。DCR方法提供的可靠置信度估计有助于筛选出大多数错误预测,这将提高自动睡眠阶段评分的可靠性和可解释性。