使用决策树集成方法对不均衡语音数据集进行帕金森病分析。

Analysis of Parkinson's Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods.

作者信息

Barukab Omar, Ahmad Amir, Khan Tabrej, Thayyil Kunhumuhammed Mujeeb Rahiman

机构信息

Department of Information Technology, Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdulaziz University, Jeddah 21589, Saudi Arabia.

College of Information Technology, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates.

出版信息

Diagnostics (Basel). 2022 Nov 30;12(12):3000. doi: 10.3390/diagnostics12123000.

DOI:10.3390/diagnostics12123000

PMID:36553007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9776735/

Abstract

Parkinson's disease (PD) currently affects approximately 10 million people worldwide. The detection of PD positive subjects is vital in terms of disease prognostics, diagnostics, management and treatment. Different types of early symptoms, such as speech impairment and changes in writing, are associated with Parkinson disease. To classify potential patients of PD, many researchers used machine learning algorithms in various datasets related to this disease. In our research, we study the dataset of the PD vocal impairment feature, which is an imbalanced dataset. We propose comparative performance evaluation using various decision tree ensemble methods, with or without oversampling techniques. In addition, we compare the performance of classifiers with different sizes of ensembles and various ratios of the minority class and the majority class with oversampling and undersampling. Finally, we combine feature selection with best-performing ensemble classifiers. The result shows that AdaBoost, random forest, and decision tree developed for the RUSBoost imbalanced dataset perform well in performance metrics such as precision, recall, F1-score, area under the receiver operating characteristic curve (AUROC) and the geometric mean. Further, feature selection methods, namely lasso and information gain, were used to screen the 10 best features using the best ensemble classifiers. AdaBoost with information gain feature selection method is the best performing ensemble method with an F1-score of 0.903.

摘要

帕金森病（PD）目前在全球影响着约1000万人。PD阳性患者的检测在疾病预后、诊断、管理和治疗方面至关重要。不同类型的早期症状，如言语障碍和书写变化，都与帕金森病相关。为了对PD潜在患者进行分类，许多研究人员在与该疾病相关的各种数据集中使用了机器学习算法。在我们的研究中，我们研究了PD语音损伤特征的数据集，这是一个不平衡数据集。我们提出使用各种决策树集成方法进行比较性能评估，有无过采样技术均可。此外，我们比较了不同大小集成以及少数类与多数类的不同比例在过采样和欠采样情况下分类器的性能。最后，我们将特征选择与性能最佳的集成分类器相结合。结果表明，为RUSBoost不平衡数据集开发的AdaBoost、随机森林和决策树在精度、召回率、F1分数、接收器操作特征曲线下面积（AUROC）和几何均值等性能指标方面表现良好。此外，使用套索和信息增益等特征选择方法，通过最佳集成分类器筛选出10个最佳特征。采用信息增益特征选择方法的AdaBoost是性能最佳的集成方法，F1分数为0.903。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1f3/9776735/0b48a71bfd49/diagnostics-12-03000-g001.jpg

相似文献

Analysis of Parkinson's Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods.

Diagnostics (Basel). 2022 Nov 30;12(12):3000. doi: 10.3390/diagnostics12123000.

Parkinson's Disease Detection Using Filter Feature Selection and a Genetic Algorithm with Ensemble Learning.

Diagnostics (Basel). 2023 Aug 31;13(17):2816. doi: 10.3390/diagnostics13172816.

Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.

Neuroimage. 2014 Feb 15;87:220-41. doi: 10.1016/j.neuroimage.2013.10.005. Epub 2013 Oct 29.

Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.

BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.

Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.

Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.

Combining Resampling Strategies and Ensemble Machine Learning Methods to Enhance Prediction of Neonates with a Low Apgar Score After Induction of Labor in Northern Tanzania.

Risk Manag Healthc Policy. 2021 Sep 7;14:3711-3720. doi: 10.2147/RMHP.S331077. eCollection 2021.

Diagnosis and classification of Parkinson's disease using ensemble learning and 1D-PDCovNN.

Comput Biol Med. 2023 Jul;161:107031. doi: 10.1016/j.compbiomed.2023.107031. Epub 2023 May 17.

A multiple combined method for rebalancing medical data with class imbalances.

Comput Biol Med. 2021 Jul;134:104527. doi: 10.1016/j.compbiomed.2021.104527. Epub 2021 May 31.

Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

J Theor Biol. 2017 Dec 21;435:208-217. doi: 10.1016/j.jtbi.2017.09.018. Epub 2017 Sep 20.

Conversion of adverse data corpus to shrewd output using sampling metrics.

Vis Comput Ind Biomed Art. 2020 Aug 11;3(1):19. doi: 10.1186/s42492-020-00055-9.

引用本文的文献

Hybrid preprocessing and ensemble classification for enhanced detection of Parkinson's disease using multiple speech signal databases.

Digit Health. 2025 Jun 26;11:20552076251352941. doi: 10.1177/20552076251352941. eCollection 2025 Jan-Dec.

A quantum inspired machine learning approach for multimodal Parkinson's disease screening.

Sci Rep. 2025 Apr 4;15(1):11660. doi: 10.1038/s41598-025-95315-0.

Construction and validation of risk prediction models for pulmonary embolism in hospitalized patients based on different machine learning methods.

Front Cardiovasc Med. 2024 Jun 25;11:1308017. doi: 10.3389/fcvm.2024.1308017. eCollection 2024.

Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review.

J Med Internet Res. 2023 Jul 19;25:e46105. doi: 10.2196/46105.

Automatic and Early Detection of Parkinson's Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method.

Diagnostics (Basel). 2023 May 31;13(11):1924. doi: 10.3390/diagnostics13111924.

本文引用的文献

Vocal Feature Extraction-Based Artificial Intelligent Model for Parkinson's Disease Detection.

Diagnostics (Basel). 2021 Jun 11;11(6):1076. doi: 10.3390/diagnostics11061076.

A new approach: information gain algorithm-based k-nearest neighbors hybrid diagnostic system for Parkinson's disease.

Phys Eng Sci Med. 2021 Jun;44(2):511-524. doi: 10.1007/s13246-021-01001-6. Epub 2021 Apr 14.

Intelligent Sensory Pen for Aiding in the Diagnosis of Parkinson's Disease from Dynamic Handwriting Analysis.

Sensors (Basel). 2020 Oct 15;20(20):5840. doi: 10.3390/s20205840.

Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson's Disease: A Comprehensive Machine Learning Approach.

Sci Rep. 2019 Nov 21;9(1):17269. doi: 10.1038/s41598-019-53656-7.

Management of Early Parkinson Disease.

Clin Geriatr Med. 2020 Feb;36(1):35-41. doi: 10.1016/j.cger.2019.09.001. Epub 2019 Sep 6.

Elemental fingerprint: Reassessment of a cerebrospinal fluid biomarker for Parkinson's disease.

Neurobiol Dis. 2020 Feb;134:104677. doi: 10.1016/j.nbd.2019.104677. Epub 2019 Nov 13.

Parkinson disease.

Eur J Neurol. 2020 Jan;27(1):27-42. doi: 10.1111/ene.14108. Epub 2019 Nov 27.

[I]Metaiodobenzylguanidine (MIBG) Cardiac Scintigraphy and Automated Classification Techniques in Parkinsonian Disorders.

Mol Imaging Biol. 2020 Jun;22(3):703-710. doi: 10.1007/s11307-019-01406-6.

Retinal texture biomarkers may help to discriminate between Alzheimer's, Parkinson's, and healthy controls.

PLoS One. 2019 Jun 21;14(6):e0218826. doi: 10.1371/journal.pone.0218826. eCollection 2019.

Serum N-Glycosylation in Parkinson's Disease: A Novel Approach for Potential Alterations.

Molecules. 2019 Jun 13;24(12):2220. doi: 10.3390/molecules24122220.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用决策树集成方法对不均衡语音数据集进行帕金森病分析。

Analysis of Parkinson's Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献