使用具有声音生物标志物的XGBoost模型对成人哮喘进行非侵入性声学分类。

Non-invasive acoustic classification of adult asthma using an XGBoost model with vocal biomarkers.

作者信息

Lyu Yi, Jiang Quan-Cheng, Yuan Shuai, Hong Jing, Chen Chun-Feng, Wu Hai-Mei, Wang Yi-Qin, Shi Yu-Jing, Yan Hai-Xia, Xu Jin

机构信息

School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, People's Republic of China.

Shanghai Key Laboratory of Health Identification and Assessment, Shanghai, 201203, People's Republic of China.

出版信息

Sci Rep. 2025 Aug 6;15(1):28682. doi: 10.1038/s41598-025-14645-1.

DOI:10.1038/s41598-025-14645-1

PMID:40770052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12328801/

Abstract

Traditional diagnostic methods for asthma, a widespread chronic respiratory illness, are often limited by factors such as patient cooperation with spirometry. Non-invasive acoustic analysis using machine learning offers a promising alternative for objective diagnosis by analyzing vocal characteristics. This study aimed to develop and validate a robust classification model for adult asthma using acoustic features from the vocalized /ɑː/ sound. In a case-control study, voice recordings of the /ɑː/ sound were collected from a primary cohort of 214 adults and an independent external validation cohort of 200 adults. This study extracted features using a modified extended Geneva Minimalistic Acoustic Parameter Set and compared seven machine learning models. The top-performing model, Extreme Gradient Boosting, was further assessed through ten-fold cross-validation, external validation, and feature analysis using SHapley Additive exPlanations and Local Interpretable Model-Agnostic Explanations. The Extreme Gradient Boosting classifier achieved the highest performance on the test set, with an accuracy of 0.8514, an Area Under the Curve of 0.9130, a recall of 0.8804, a precision of 0.8387, an F1-score of 0.8567, a Kappa coefficient of 0.7018, and a Matthews Correlation Coefficient of 0.7071. On the external validation set, the model maintained strong performance with an accuracy of 0.8100, AUC of 0.8755, recall of 0.8300, precision of 0.7981, F1-score of 0.8137, Kappa of 0.6200, and Matthews Correlation Coefficient of 0.6205. Interpretability analysis identified formant frequencies as the most significant acoustic predictors. An Extreme Gradient Boosting model utilizing features from the extended Geneva Minimalistic Acoustic Parameter Set is an accurate and viable non-invasive method for classifying adult asthma, holding significant potential for developing accessible tools for early diagnosis, remote monitoring, and improved asthma management.

摘要

哮喘是一种常见的慢性呼吸道疾病，其传统诊断方法往往受到患者对肺功能测定配合程度等因素的限制。利用机器学习进行的非侵入性声学分析，通过分析声音特征为客观诊断提供了一种很有前景的替代方法。本研究旨在利用发/ɑː/音时的声学特征，开发并验证一种针对成人哮喘的强大分类模型。在一项病例对照研究中，从214名成年人的主要队列和200名成年人的独立外部验证队列中收集了发/ɑː/音的语音记录。本研究使用改良的扩展日内瓦简约声学参数集提取特征，并比较了七种机器学习模型。表现最佳的模型——极端梯度提升算法，通过十折交叉验证、外部验证以及使用SHapley加性解释和局部可解释模型无关解释的特征分析进行了进一步评估。极端梯度提升分类器在测试集上取得了最高性能，准确率为0.8514，曲线下面积为0.9130，召回率为0.8804，精确率为0.8387，F1分数为0.8567，卡帕系数为0.7018，马修斯相关系数为0.7071。在外部验证集上，该模型保持了强劲性能，准确率为0.8100，曲线下面积为0.8755，召回率为0.8300，精确率为0.7981，F1分数为0.8137，卡帕系数为0.6200，马修斯相关系数为0.6205。可解释性分析确定共振峰频率是最显著的声学预测指标。利用扩展日内瓦简约声学参数集特征的极端梯度提升模型是一种准确且可行的非侵入性成人哮喘分类方法，在开发便于早期诊断、远程监测和改善哮喘管理的工具方面具有巨大潜力。