Sheta Alaa, Thaher Thaer, Surani Salim R, Turabieh Hamza, Braik Malik, Too Jingwei, Abu-El-Rub Noor, Mafarjah Majdi, Chantar Hamouda, Subramanian Shyam
Computer Science Department, Southern Connecticut State University, New Haven, CT 06514, USA.
Department of Computer Systems Engineering, Arab American University, Jenin P.O. Box 240, Palestine.
Diagnostics (Basel). 2023 Jul 20;13(14):2417. doi: 10.3390/diagnostics13142417.
Obstructive sleep apnea (OSA) is a prevalent sleep disorder that affects approximately 3-7% of males and 2-5% of females. In the United States alone, 50-70 million adults suffer from various sleep disorders. OSA is characterized by recurrent episodes of breathing cessation during sleep, thereby leading to adverse effects such as daytime sleepiness, cognitive impairment, and reduced concentration. It also contributes to an increased risk of cardiovascular conditions and adversely impacts patient overall quality of life. As a result, numerous researchers have focused on developing automated detection models to identify OSA and address these limitations effectively and accurately. This study explored the potential benefits of utilizing machine learning methods based on demographic information for diagnosing the OSA syndrome. We gathered a comprehensive dataset from the Torr Sleep Center in Corpus Christi, Texas, USA. The dataset comprises 31 features, including demographic characteristics such as race, age, sex, BMI, Epworth score, M. Friedman tongue position, snoring, and more. We devised a novel process encompassing pre-processing, data grouping, feature selection, and machine learning classification methods to achieve the research objectives. The classification methods employed in this study encompass decision tree (DT), naive Bayes (NB), k-nearest neighbor (kNN), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression (LR), and subspace discriminant (Ensemble) classifiers. Through rigorous experimentation, the results indicated the superior performance of the optimized kNN and SVM classifiers for accurately classifying sleep apnea. Moreover, significant enhancements in model accuracy were observed when utilizing the selected demographic variables and employing data grouping techniques. For instance, the accuracy percentage demonstrated an approximate improvement of 4.5%, 5%, and 10% with the feature selection approach when applied to the grouped data of Caucasians, females, and individuals aged 50 or below, respectively. Furthermore, a comparison with prior studies confirmed that effective data grouping and proper feature selection yielded superior performance in OSA detection when combined with an appropriate classification method. Overall, the findings of this research highlight the importance of leveraging demographic information, employing proper feature selection techniques, and utilizing optimized classification models for accurate and efficient OSA diagnosis.
阻塞性睡眠呼吸暂停(OSA)是一种常见的睡眠障碍,影响着约3%-7%的男性和2%-5%的女性。仅在美国,就有5000万至7000万成年人患有各种睡眠障碍。OSA的特征是睡眠期间反复出现呼吸暂停,从而导致诸如白天嗜睡、认知障碍和注意力下降等不良影响。它还会增加心血管疾病的风险,并对患者的整体生活质量产生不利影响。因此,众多研究人员致力于开发自动检测模型,以有效且准确地识别OSA并解决这些局限性。本研究探讨了利用基于人口统计学信息的机器学习方法诊断OSA综合征的潜在益处。我们从美国得克萨斯州科珀斯克里斯蒂的托尔睡眠中心收集了一个综合数据集。该数据集包含31个特征,包括种族、年龄、性别、体重指数、爱泼华嗜睡量表评分、弗里德曼舌位、打鼾等人口统计学特征。我们设计了一个新颖的过程,包括预处理、数据分组、特征选择和机器学习分类方法,以实现研究目标。本研究中使用的分类方法包括决策树(DT)、朴素贝叶斯(NB)、k近邻(kNN)、支持向量机(SVM)、线性判别分析(LDA)、逻辑回归(LR)和子空间判别(集成)分类器。通过严格的实验,结果表明优化后的kNN和SVM分类器在准确分类睡眠呼吸暂停方面表现更优。此外,在使用选定的人口统计学变量并采用数据分组技术时,模型准确性有显著提高。例如,当特征选择方法应用于白种人、女性和50岁及以下个体的分组数据时,准确率分别提高了约4.5%、5%和10%。此外,与先前研究的比较证实,有效的数据分组和适当的特征选择与合适的分类方法相结合,在OSA检测中能产生更优的性能。总体而言,本研究结果凸显了利用人口统计学信息、采用适当的特征选择技术以及使用优化的分类模型进行准确高效的OSA诊断的重要性。