Centre for Transport Studies, Department of Civil and Environmental Engineering, Imperial College London, London SW7 2AZ, UK.
Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, UK.
Sensors (Basel). 2022 Nov 9;22(22):8630. doi: 10.3390/s22228630.
Obstructive sleep apnea (OSA) is a global health concern and is typically diagnosed using in-laboratory polysomnography (PSG). However, PSG is highly time-consuming and labor-intensive. We, therefore, developed machine learning models based on easily accessed anthropometric features to screen for the risk of moderate to severe and severe OSA. We enrolled 3503 patients from Taiwan and determined their PSG parameters and anthropometric features. Subsequently, we compared the mean values among patients with different OSA severity and considered correlations among all participants. We developed models based on the following machine learning approaches: logistic regression, k-nearest neighbors, naïve Bayes, random forest (RF), support vector machine, and XGBoost. Collected data were first independently split into two data sets (training and validation: 80%; testing: 20%). Thereafter, we adopted the model with the highest accuracy in the training and validation stage to predict the testing set. We explored the importance of each feature in the OSA risk screening by calculating the Shapley values of each input variable. The RF model achieved the highest accuracy for moderate to severe (84.74%) and severe (72.61%) OSA. The level of visceral fat was found to be a predominant feature in the risk screening models of OSA with the aforementioned levels of severity. Our machine learning models can be employed to screen for OSA risk in the populations in Taiwan and in those with similar craniofacial structures.
阻塞性睡眠呼吸暂停(OSA)是一个全球性的健康问题,通常使用实验室多导睡眠图(PSG)进行诊断。然而,PSG 非常耗时且劳动强度大。因此,我们基于易于获取的人体测量特征开发了机器学习模型,以筛查中重度和重度 OSA 的风险。我们从台湾招募了 3503 名患者,确定了他们的 PSG 参数和人体测量特征。随后,我们比较了不同 OSA 严重程度患者的平均值,并考虑了所有参与者之间的相关性。我们基于以下机器学习方法开发了模型:逻辑回归、k-最近邻、朴素贝叶斯、随机森林(RF)、支持向量机和 XGBoost。收集的数据首先独立分为两个数据集(训练和验证:80%;测试:20%)。然后,我们采用在训练和验证阶段准确性最高的模型来预测测试集。我们通过计算每个输入变量的 Shapley 值来探索每个特征在 OSA 风险筛查中的重要性。RF 模型对中重度(84.74%)和重度(72.61%)OSA 的预测准确性最高。研究发现,内脏脂肪水平是上述严重程度的 OSA 风险筛查模型中的主要特征。我们的机器学习模型可用于筛查台湾人群和具有相似颅面结构人群的 OSA 风险。