Suppr
超能文献

基于远程监测语音特征的帕金森病分类堆叠集成学习

Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.

作者信息

Omodunbi Bolaji A, Olawade David B, Awe Omosigho F, Soladoye Afeez A, Aderinto Nicholas, Ovsepian Saak V, Boussios Stergios

机构信息

Department of Computer Engineering, Federal University Oye-Ekiti, Oye-Ekiti 371104, Nigeria.

Department of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London E16 2RD, UK.

出版信息

Diagnostics (Basel). 2025 Jun 9;15(12):1467. doi: 10.3390/diagnostics15121467.

DOI:10.3390/diagnostics15121467

PMID:40564788

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12191892/

Abstract

Parkinson's disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min-max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques-forward search, gain ratio, and Kruskal-Wallis test-were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system's methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability.

摘要

帕金森病（PD）是一种进行性神经退行性疾病，会损害运动和非运动功能。早期准确诊断对于有效管理和护理至关重要。本研究利用机器学习（ML）技术，旨在采用堆叠集成学习方法开发一个强大的帕金森病预测系统，以应对数据集不平衡和特征优化等挑战。使用了一个开放获取的帕金森病数据集，该数据集包含22个声音属性和来自31名受试者的195个实例。为防止数据泄露，将受试者分为训练组（22名受试者）和测试组（9名受试者），确保没有受试者同时出现在两组中。预处理包括通过最小-最大缩放进行数据清理和归一化。合成少数过采样技术（SMOTE）仅应用于训练集以解决类别不平衡问题。使用基于受试者的交叉验证，采用前向搜索、增益比和Kruskal-Wallis检验等特征选择技术来识别重要属性。所开发的系统在堆叠集成学习框架中，将支持向量机（SVM）、随机森林（RF）、K近邻（KNN）和决策树（DT）作为基分类器，将逻辑回归（LR）作为元分类器。使用基于记录和基于受试者的指标来评估性能，以确保临床相关性。堆叠集成学习模型在完全未见过的受试者上实现了实际性能，基于记录的准确率为84.7%，基于受试者的准确率为77.8%，优于包括KNN（81.4%）、RF（79.7%）和SVM（76.3%）在内的单个分类器。训练集内的交叉验证显示准确率为89.2%，性能差异突出了适当验证方法的重要性。特征选择结果表明，使用按增益比排名的前10个特征可在性能和临床可解释性之间提供最佳平衡。通过严格的基于受试者的评估验证了该系统方法的稳健性，证明了验证方法对报告性能的关键影响。通过实施基于受试者的验证并防止数据泄露，本研究表明，与有缺陷的基于记录的方法相比，适当的验证会产生截然不同（且更现实）的结果。研究结果强调了验证方法在医疗保健ML应用中的至关重要性，并为方法合理的帕金森病分类研究提供了一个模板。未来的研究应侧重于使用更大的多中心数据集验证模型，并实施标准化的验证协议以提高临床适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6732/12191892/e4e918bc2de5/diagnostics-15-01467-g001.jpg

相似文献

Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.

Diagnostics (Basel). 2025 Jun 9;15(12):1467. doi: 10.3390/diagnostics15121467.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

From pixels to prognosis: leveraging radiomics and machine learning to predict IDH1 genotype in gliomas.

Neurosurg Rev. 2025 Apr 29;48(1):396. doi: 10.1007/s10143-025-03515-z.

Two-stage ensemble learning framework for automated classification of keratoconus severity.

Comput Biol Med. 2025 Sep;195:110568. doi: 10.1016/j.compbiomed.2025.110568. Epub 2025 Jun 25.

Machine learning models predict triage levels, massive transfusion protocol activation, and mortality in trauma utilizing patients hemodynamics on admission.

Comput Biol Med. 2024 Sep;179:108880. doi: 10.1016/j.compbiomed.2024.108880. Epub 2024 Jul 16.

Artificial Intelligence-Based prediction model for surgical site infection in metastatic spinal disease: a multicenter development and validation study.

Int J Surg. 2025 Jun 27. doi: 10.1097/JS9.0000000000002806.

The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments.

JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159.

Eliciting adverse effects data from participants in clinical trials.

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

本文引用的文献

Optimizing stroke prediction using gated recurrent unit and feature selection in Sub-Saharan Africa.

Clin Neurol Neurosurg. 2025 Feb;249:108761. doi: 10.1016/j.clineuro.2025.108761. Epub 2025 Jan 27.

Integrating AI-driven wearable devices and biometric data into stroke risk assessment: A review of opportunities and challenges.

Clin Neurol Neurosurg. 2025 Feb;249:108689. doi: 10.1016/j.clineuro.2024.108689. Epub 2024 Dec 10.

Prediction techniques of movie box office using neural networks and emotional mining.

Sci Rep. 2024 Sep 11;14(1):21209. doi: 10.1038/s41598-024-72340-z.

Feature importance feedback with Deep Q process in ensemble-based metaheuristic feature selection algorithms.

Sci Rep. 2024 Feb 5;14(1):2923. doi: 10.1038/s41598-024-53141-w.

Parkinson's disease detection based on features refinement through L1 regularized SVM and deep neural network.

Sci Rep. 2024 Jan 16;14(1):1333. doi: 10.1038/s41598-024-51600-y.

Improving machine learning with ensemble learning on observational healthcare data.

AMIA Annu Symp Proc. 2024 Jan 11;2023:521-529. eCollection 2023.

Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction.

Sci Rep. 2023 Dec 18;13(1):22588. doi: 10.1038/s41598-023-49962-w.

Review of feature selection approaches based on grouping of features.

PeerJ. 2023 Jul 17;11:e15666. doi: 10.7717/peerj.15666. eCollection 2023.

Current Treatments and New, Tentative Therapies for Parkinson's Disease.

Pharmaceutics. 2023 Feb 25;15(3):770. doi: 10.3390/pharmaceutics15030770.

Voice in Parkinson's Disease: A Machine Learning Study.

Front Neurol. 2022 Feb 15;13:831428. doi: 10.3389/fneur.2022.831428. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于远程监测语音特征的帕金森病分类堆叠集成学习

Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译