IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5903-5915. doi: 10.1109/TPAMI.2021.3070057. Epub 2022 Aug 4.
Automating sleep staging is vital to scale up sleep assessment and diagnosis to serve millions experiencing sleep deprivation and disorders and enable longitudinal sleep monitoring in home environments. Learning from raw polysomnography signals and their derived time-frequency image representations has been prevalent. However, learning from multi-view inputs (e.g., both the raw signals and the time-frequency images) for sleep staging is difficult and not well understood. This work proposes a sequence-to-sequence sleep staging model, XSleepNet, that is capable of learning a joint representation from both raw signals and time-frequency images. Since different views may generalize or overfit at different rates, the proposed network is trained such that the learning pace on each view is adapted based on their generalization/overfitting behavior. In simple terms, the learning on a particular view is speeded up when it is generalizing well and slowed down when it is overfitting. View-specific generalization/overfitting measures are computed on-the-fly during the training course and used to derive weights to blend the gradients from different views. As a result, the network is able to retain the representation power of different views in the joint features which represent the underlying distribution better than those learned by each individual view alone. Furthermore, the XSleepNet architecture is principally designed to gain robustness to the amount of training data and to increase the complementarity between the input views. Experimental results on five databases of different sizes show that XSleepNet consistently outperforms the single-view baselines and the multi-view baseline with a simple fusion strategy. Finally, XSleepNet also outperforms prior sleep staging methods and improves previous state-of-the-art results on the experimental databases.
自动化睡眠分期对于扩大睡眠评估和诊断规模以服务于数以百万计的睡眠剥夺和睡眠障碍患者,并实现家庭环境中的纵向睡眠监测至关重要。从原始多导睡眠图信号及其衍生的时频图像表示中进行学习已经很普遍。然而,从多视图输入(例如,原始信号和时频图像)中学习进行睡眠分期是困难的,并且尚未得到很好的理解。这项工作提出了一种序列到序列的睡眠分期模型 XSleepNet,它能够从原始信号和时频图像中学习联合表示。由于不同的视图可能以不同的速度进行泛化或过拟合,因此所提出的网络是根据其泛化/过拟合行为来训练的,以便在每个视图上的学习速度适应。简而言之,当特定视图很好地泛化时,学习速度会加快,而过拟合时则会减慢。在训练过程中,会实时计算特定视图的泛化/过拟合度量,并使用这些度量来派生权重,以混合来自不同视图的梯度。因此,该网络能够在联合特征中保留不同视图的表示能力,这些特征比单独学习的单个视图更能代表潜在分布。此外,XSleepNet 架构主要旨在提高对训练数据量的鲁棒性,并增加输入视图之间的互补性。在五个不同大小的数据库上的实验结果表明,XSleepNet 始终优于单视图基线和具有简单融合策略的多视图基线。最后,XSleepNet 还优于先前的睡眠分期方法,并在实验数据库上提高了先前的最新技术水平。