Hong Joonki, Yang Seung Koo, Kim Seunghun, Cho Sung-Woo, Oh Jayoung, Cho Eun Sung, Yoon In-Young, Lee Dongheon, Kim Jeong-Whun
Asleep Research Institute, Seoul, Republic of Korea.
Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Republic of Korea.
Nat Sci Sleep. 2025 Mar 31;17:519-530. doi: 10.2147/NSS.S514631. eCollection 2025.
Despite the prevalence of sleep-related disorders, few studies have developed deep learning models to predict snoring using home-recorded smartphone audio. This study proposes a real-time snoring detection method utilizing a Vision Transformer-based deep learning model and smartphone recordings.
Participants' sleep-breathing sounds were recorded using smartphones, with concurrent Level I or II polysomnography (PSG) conducted in home or hospital settings. A total of 200 minutes of smartphone audio per participant, corresponding to 400 30-second sleep stage epochs on PSG, were sampled. Each epoch was annotated independently by two trained labelers, with snoring labeled only when both agreed. Model performance was evaluated by epoch-by-epoch prediction accuracy and correlation between observed and predicted snoring ratios.
The study included 214 participants (85,600 epochs). Hospital audio data from 105 participants (42,000 epochs) were used for training, while home audio data from 109 participants were split into 54 participants (21,600 epochs) for training and 55 participants (22,000 epochs) for testing. On the test dataset, the model demonstrated a sensitivity of 89.8% and a specificity of 91.3%. Correlation analysis showed strong agreement between observed and predicted snoring ratios (r = 0.97, 95% CI: 0.95-0.99).
This study demonstrates the feasibility of using deep learning for real-time snoring detection from home-recorded smartphone audio. With high accuracy and scalability, the approach offers a practical and accessible tool for monitoring sleep-related disorders, paving the way for home-based sleep health management solutions.
尽管与睡眠相关的疾病很普遍,但很少有研究开发深度学习模型来利用家庭录制的智能手机音频预测打鼾。本研究提出了一种利用基于视觉变换器的深度学习模型和智能手机录音的实时打鼾检测方法。
使用智能手机记录参与者的睡眠呼吸声音,同时在家庭或医院环境中进行一级或二级多导睡眠图(PSG)检查。每位参与者共采集200分钟的智能手机音频,对应于PSG上400个30秒的睡眠阶段时段。每个时段由两名经过培训的标注人员独立标注,只有两人都同意时才将打鼾标注出来。通过逐个时段的预测准确率以及观察到的和预测的打鼾率之间的相关性来评估模型性能。
该研究纳入了214名参与者(85,600个时段)。来自105名参与者(42,000个时段)的医院音频数据用于训练,而来自109名参与者的家庭音频数据分为54名参与者(21,600个时段)用于训练和55名参与者(22,000个时段)用于测试。在测试数据集上,该模型的灵敏度为89.8%,特异度为91.3%。相关性分析显示观察到的和预测的打鼾率之间有很强的一致性(r = 0.97,95%CI:0.95 - 0.99)。
本研究证明了使用深度学习从家庭录制的智能手机音频中进行实时打鼾检测的可行性。该方法具有高准确性和可扩展性,为监测与睡眠相关的疾病提供了一种实用且可及的工具,为基于家庭的睡眠健康管理解决方案铺平了道路。