Suppr超能文献

基于过采样技术的听力损失识别混合统计与机器学习方法。

Hybrid statistical and machine-learning approach to hearing-loss identification based on an oversampling technique.

作者信息

Wang Tang-Chuan, Sun Ko-Han, Chih Mingchang, Chen Wei-Chun

机构信息

Department of Otolaryngology-Head and Neck Surgery, China Medical University Hsinchu Hospital, Zhubei City, Hsinchu, 302056, Taiwan, ROC; Department of Master Program for Biomedical Engineering, College of Biomedical Engineering, China Medical University, Taichung, 404328, Taiwan, ROC; School of Medicine, College of Medicine, China Medical University, Taichung, 404328, Taiwan, ROC.

Department of Business Administration, National Chung Hsing University, South District, Taichung, 402202, Taiwan, ROC.

出版信息

Comput Biol Med. 2025 Feb;185:109539. doi: 10.1016/j.compbiomed.2024.109539. Epub 2024 Dec 12.

Abstract

BACKGROUND AND OBJECTIVES

Hearing loss is a crucial global health hazard exerting considerable social and physiological effects on spoken language and cognition. Patients affected by this condition may experience social and professional hardships that dominate occupational injuries. Therefore, the identification of the features of recessive hearing loss is important for clinicians to prevent further disease progression. This work aimed to develop a hybrid statistical and machine-learning approach as a decision-support mechanism. We expect the proposed model to help predict hearing-loss disorders and support clinical diagnosis.

METHODS

A three-phase hybrid approach was proposed to implement classification models. A stepwise method and a random forest (RF) technique were utilized as filters during feature selection. Phase I involved reducing the number of input variables and selecting the most influential features. Phase II included the use of an oversampling technique called synthetic minority oversampling technique (SMOTE) to oversample the minority class and balance the sample size between the target and nontarget classes. Phase III focused on the final model selection based on three supervised classification models, namely, the logistic regression, multilayer perceptron, and support vector machine (SVM), for the target identification and prediction of the case of interest (i.e., hearing loss).

RESULTS

The analysis of phase I involved the selection and acquisition of three and seven features through the stepwise technique and RF method, respectively. The SMOTE technique alleviated the imbalanced data issue and improved the predictive capability substantially in phase II and III. Accordingly, in terms of accuracy, precision, recall, and F1 score, our empirical results demonstrated that the proposed hybrid approach involving the SVM method combined with a stepwise technique was competitive against the logistic model featuring all variables. Furthermore, the SVM models that cooperated with the stepwise and RF technique showed superiority to other approaches in terms of the area under the curve (AUC).

CONCLUSION

Compared with multivariate models, the hybrid approach combining the SVM method coupled with a stepwise technique and/or an RF technique is an excellent alternative with a higher efficiency. This approach requires fewer predictors in the model and can be competitive in terms of the accuracy, precision, recall, F1 score, and AUC. This work highlights the potential of hybrid statistical and machine-learning approaches. Our model can be used as a screening tool for upfront forecasting in clinical practice. The proposed hybrid approach also demonstrates a powerful capability to identify vital features and predict hearing loss.

摘要

背景与目的

听力损失是一项重大的全球健康危害,对口语和认知产生相当大的社会和生理影响。受此病症影响的患者可能会经历社会和职业困境,这些困境在职业伤害中占主导地位。因此,识别隐性听力损失的特征对于临床医生预防疾病进一步发展很重要。本研究旨在开发一种混合统计与机器学习方法作为决策支持机制。我们期望所提出的模型有助于预测听力损失疾病并支持临床诊断。

方法

提出了一种三阶段混合方法来实现分类模型。在特征选择过程中,采用逐步法和随机森林(RF)技术作为筛选器。第一阶段包括减少输入变量的数量并选择最具影响力的特征。第二阶段包括使用一种称为合成少数过采样技术(SMOTE)的过采样技术对少数类进行过采样,并平衡目标类和非目标类之间的样本大小。第三阶段专注于基于逻辑回归、多层感知器和支持向量机(SVM)这三种监督分类模型进行最终模型选择,以识别和预测感兴趣的病例(即听力损失)。

结果

第一阶段的分析分别通过逐步技术和RF方法选择并获取了三个和七个特征。SMOTE技术缓解了数据不平衡问题,并在第二阶段和第三阶段显著提高了预测能力。因此,在准确性、精确性、召回率和F1分数方面,我们的实证结果表明,所提出的结合SVM方法和逐步技术的混合方法与包含所有变量的逻辑模型相比具有竞争力。此外,与逐步技术和RF技术配合的SVM模型在曲线下面积(AUC)方面比其他方法表现更优。

结论

与多变量模型相比,结合SVM方法与逐步技术和/或RF技术的混合方法是一种效率更高的优秀替代方案。这种方法在模型中需要的预测变量较少,并且在准确性、精确性、召回率、F1分数和AUC方面具有竞争力。这项工作突出了混合统计与机器学习方法的潜力。我们的模型可作为临床实践中前期预测的筛查工具。所提出的混合方法还展示了强大的识别关键特征和预测听力损失的能力。

相似文献

10
Efficient Prediction of Missed Clinical Appointment Using Machine Learning.利用机器学习高效预测临床预约失约情况。
Comput Math Methods Med. 2021 Oct 22;2021:2376391. doi: 10.1155/2021/2376391. eCollection 2021.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验