Blekinge Institute of Technology, Valhallavägen 1, 371 41 Karlskrona, Sweden.
Blekinge Institute of Technology, Valhallavägen 1, 371 41 Karlskrona, Sweden.
Artif Intell Med. 2024 Oct;156:102953. doi: 10.1016/j.artmed.2024.102953. Epub 2024 Aug 15.
Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages.
The proposed study aims to explore whether the voice features extracted from the vowel "a" utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset.
Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel "a" utterance commenced following an information and consent meeting with each participant using the VoiceDiagnostic application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures.
The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order.
This study concludes that the utterance of vowel "a" recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis.
NCT05897944.
慢性阻塞性肺疾病(COPD)是一种严重的疾病,影响着全球数百万人,导致每年有大量的人死亡。由于在早期阶段没有明显的症状,因此对受影响的人的诊断率很高。除了肺功能衰竭外,COPD 的另一个有害问题是系统性影响,例如心力衰竭或声音失真。然而,COPD 的系统性影响可能提供有价值的早期检测信息。换句话说,由系统性影响引起的症状可能有助于在早期阶段发现病情。
本研究旨在探讨通过在新收集的语音数据上使用机器学习(ML),从元音“a”的发音中提取的语音特征是否可以携带有关 COPD 的预测信息。
2022 年 1 月至 2023 年 5 月期间,在瑞典布莱金厄理工学院(BTH)的研究诊所访客中招募了 48 名参与者。收集了 48 名参与者的 1246 条语音记录组成的数据集。在与每位参与者进行信息和同意会议后,使用 VoiceDiagnostic 应用程序开始收集包含元音“a”发音的语音记录。收集的语音数据经过静音段去除、基线声学特征的特征提取和梅尔频率倒谱系数(MFCC)处理。还从参与者那里收集了社会人口统计学数据。研究了三种用于 COPD 和健康对照组的二进制分类的 ML 模型:随机森林(RF)、支持向量机(SVM)和 CatBoost(CB)。采用嵌套 k 折交叉验证方法。此外,使用网格搜索在每个 ML 模型上优化了超参数。为了进行最佳性能评估,计算了准确性、F1 分数、精度和召回率指标。之后,我们进一步通过使用曲线下面积(AUC)、平均精度(AP)和 Shapley 加性解释(SHAP)特征重要性度量来检查最佳分类器。
RF、SVM 和 CB 分类器在测试集上的最大准确性分别为 77%、69%和 78%,在验证集上的最大准确性分别为 93%、78%和 97%。CB 分类器优于 RF 和 SVM。在进一步研究表现最佳的分类器后,CB 表现出最高的性能,在测试集和验证集上的 AUC 分别为 82%和 76%。除了年龄和性别外,基线声学和 MFCC 特征的平均值在测试集和验证集的分类性能中表现出很高的重要性和确定性特征,尽管顺序不同。
本研究得出的结论是,元音“a”的发音记录包含可以通过 CatBoost 分类器以高精度进行 COPD 分类的信息。此外,基线声学和 MFCC 特征,结合年龄和性别信息,可用于分类目的,并为 COPD 诊断的决策支持提供医疗保健益处。
NCT05897944。