Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan.
International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Tsukuba, Japan.
Sci Rep. 2022 Jul 27;12(1):12799. doi: 10.1038/s41598-022-16334-9.
Scoring sleep stages from biological signals is an essential but labor-intensive inspection for sleep diagnosis. The existing automated scoring methods have achieved high accuracy but are not widely applied in clinical practice. In our understanding, the existing methods have failed to establish the trust of sleep experts (e.g., physicians and clinical technologists) due to a lack of ability to explain the evidences/clues for scoring. In this study, we developed a deep-learning-based scoring model with a reasoning mechanism called class activation mapping (CAM) to solve this problem. This mechanism explicitly shows which portions of the signals support our model's sleep stage decision, and we verified that these portions overlap with the "characteristic waves," which are evidences/clues used in the manual scoring process. In exchange for the acquisition of explainability, employing CAM makes it difficult to follow some scoring rules. Although we concerned the negative effect of CAM on the scoring accuracy, we have found that the impact is limited. The evaluation experiment shows that the proposed model achieved a scoring accuracy of [Formula: see text]. It is superior to those of some existing methods and the inter-rater reliability among the sleep experts. These results suggest that Sleep-CAM achieved both explainability and required scoring accuracy for practical usage.
从生物信号中对睡眠阶段进行评分是睡眠诊断中必不可少但非常繁琐的检查。现有的自动化评分方法已经达到了很高的准确性,但并未广泛应用于临床实践。在我们看来,由于缺乏对评分证据/线索的解释能力,现有的方法未能建立起睡眠专家(如医生和临床技术人员)的信任。在这项研究中,我们开发了一种基于深度学习的评分模型,该模型具有称为类激活映射(CAM)的推理机制,以解决这个问题。该机制明确显示了信号的哪些部分支持我们的模型的睡眠阶段决策,并且我们验证了这些部分与“特征波”重叠,特征波是手动评分过程中使用的证据/线索。为了获得可解释性,使用 CAM 使得遵循某些评分规则变得困难。尽管我们担心 CAM 对评分准确性的负面影响,但我们发现这种影响是有限的。评估实验表明,所提出的模型的评分准确性达到了[公式:见正文]。它优于一些现有方法和睡眠专家之间的组内一致性。这些结果表明,Sleep-CAM 既实现了可解释性,又满足了实际使用所需的评分准确性。