Suppr超能文献

使用决策树集成方法对不均衡语音数据集进行帕金森病分析。

Analysis of Parkinson's Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods.

作者信息

Barukab Omar, Ahmad Amir, Khan Tabrej, Thayyil Kunhumuhammed Mujeeb Rahiman

机构信息

Department of Information Technology, Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.

College of Information Technology, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates.

出版信息

Diagnostics (Basel). 2022 Nov 30;12(12):3000. doi: 10.3390/diagnostics12123000.

Abstract

Parkinson's disease (PD) currently affects approximately 10 million people worldwide. The detection of PD positive subjects is vital in terms of disease prognostics, diagnostics, management and treatment. Different types of early symptoms, such as speech impairment and changes in writing, are associated with Parkinson disease. To classify potential patients of PD, many researchers used machine learning algorithms in various datasets related to this disease. In our research, we study the dataset of the PD vocal impairment feature, which is an imbalanced dataset. We propose comparative performance evaluation using various decision tree ensemble methods, with or without oversampling techniques. In addition, we compare the performance of classifiers with different sizes of ensembles and various ratios of the minority class and the majority class with oversampling and undersampling. Finally, we combine feature selection with best-performing ensemble classifiers. The result shows that AdaBoost, random forest, and decision tree developed for the RUSBoost imbalanced dataset perform well in performance metrics such as precision, recall, F1-score, area under the receiver operating characteristic curve (AUROC) and the geometric mean. Further, feature selection methods, namely lasso and information gain, were used to screen the 10 best features using the best ensemble classifiers. AdaBoost with information gain feature selection method is the best performing ensemble method with an F1-score of 0.903.

摘要

帕金森病(PD)目前在全球影响着约1000万人。PD阳性患者的检测在疾病预后、诊断、管理和治疗方面至关重要。不同类型的早期症状,如言语障碍和书写变化,都与帕金森病相关。为了对PD潜在患者进行分类,许多研究人员在与该疾病相关的各种数据集中使用了机器学习算法。在我们的研究中,我们研究了PD语音损伤特征的数据集,这是一个不平衡数据集。我们提出使用各种决策树集成方法进行比较性能评估,有无过采样技术均可。此外,我们比较了不同大小集成以及少数类与多数类的不同比例在过采样和欠采样情况下分类器的性能。最后,我们将特征选择与性能最佳的集成分类器相结合。结果表明,为RUSBoost不平衡数据集开发的AdaBoost、随机森林和决策树在精度、召回率、F1分数、接收器操作特征曲线下面积(AUROC)和几何均值等性能指标方面表现良好。此外,使用套索和信息增益等特征选择方法,通过最佳集成分类器筛选出10个最佳特征。采用信息增益特征选择方法的AdaBoost是性能最佳的集成方法,F1分数为0.903。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1f3/9776735/0b48a71bfd49/diagnostics-12-03000-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验