Yu Zeren
College of Biomedical Engineering, Hainan University, Haikou, Hainan, China.
medRxiv. 2025 Jun 16:2025.06.15.25329640. doi: 10.1101/2025.06.15.25329640.
Identifying robust biomarkers for future cardiometabolic risk within the crucial "preventive window" in healthy individuals remains a major challenge. While numerous sleep metrics are linked to health, their hierarchical importance is unknown. This study aimed to leverage a data-driven machine learning paradigm to move beyond conventional metrics and objectively identify the core sleep-related physiological drivers for predicting the transition to early-stage cardiometabolic risk.
We conducted a longitudinal analysis on 447 initially healthy participants from the Sleep Heart Health Study (SHHS). A LASSO (L1-regularized) logistic regression model was trained on 16 high-quality clinical and polysomnographic features to perform data-driven biomarker selection, following a rigorous data quality audit where high-missingness variables (e.g., heart rate variability) were excluded. The performance of the final models was rigorously evaluated using 10-repeats of 10-fold cross-validation and compared using paired t-tests.
LASSO regression identified a parsimonious set of six core predictors. Notably, respiratory disturbance index (RDI) and minimum nocturnal oxygen saturation (min_spo2) emerged as the key biomarkers, superseding traditional sleep fragmentation metrics like the arousal index. In the primary cross-validation analysis, the lean LASSO model demonstrated the strongest predictive performance (mean AUC = 0.698), statistically outperforming a complex model with all 16 features (mean AUC = 0.669, p<0.0001). This superiority and robustness were maintained in high-risk subgroups.
Our data-driven approach reveals that physiological stress directly linked to sleep-disordered breathing and nocturnal hypoxemia, rather than general sleep fragmentation, are the primary drivers of the transition towards early cardiometabolic risk in healthy individuals. This finding provides specific, translatable targets for precision preventive medicine, points towards novel mechanisms for early risk development, and offers a blueprint for developing next-generation screening tools, potentially integrated into wearable technology.
在健康个体的关键“预防窗口期”内识别未来心脏代谢风险的可靠生物标志物仍然是一项重大挑战。虽然众多睡眠指标与健康相关,但其层级重要性尚不清楚。本研究旨在利用数据驱动的机器学习范式,超越传统指标,客观地识别与睡眠相关的核心生理驱动因素,以预测向早期心脏代谢风险的转变。
我们对睡眠心脏健康研究(SHHS)中447名最初健康的参与者进行了纵向分析。在严格的数据质量审核(排除高缺失率变量,如心率变异性)后,使用16项高质量临床和多导睡眠图特征训练LASSO(L1正则化)逻辑回归模型,以进行数据驱动的生物标志物选择。最终模型的性能通过10次重复的10折交叉验证进行严格评估,并使用配对t检验进行比较。
LASSO回归确定了一组由六个核心预测因子组成的简约集合。值得注意的是,呼吸紊乱指数(RDI)和夜间最低血氧饱和度(min_spo2)成为关键生物标志物,取代了诸如觉醒指数等传统睡眠片段化指标。在初次交叉验证分析中,精简的LASSO模型表现出最强的预测性能(平均AUC = 0.698),在统计学上优于具有所有16个特征的复杂模型(平均AUC = 0.669,p<0.0001)。这种优势和稳健性在高危亚组中得以保持。
我们的数据驱动方法表明,与睡眠呼吸紊乱和夜间低氧血症直接相关的生理应激,而非一般的睡眠片段化,是健康个体向早期心脏代谢风险转变的主要驱动因素。这一发现为精准预防医学提供了具体的、可转化的靶点,指出了早期风险发展的新机制,并为开发下一代筛查工具提供了蓝图,这些工具可能会集成到可穿戴技术中。