Chen Meimei, Wang Yang, Lei Huangwei, Zhang Fei, Huang Ruina, Yang Zhaoyang
College of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China.
Fujian Key Laboratory of Health Status Identification of Traditional Chinese Medicine, Fuzhou 350122, China.
Nan Fang Yi Ke Da Xue Xue Bao. 2025 Apr 20;45(4):711-717. doi: 10.12122/j.issn.1673-4254.2025.04.05.
To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.
We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.
The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).
Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.
利用6种机器学习算法和阈下抑郁症患者的声音情感特征构建语音识别分类模型,以促进阈下抑郁症的早期识别。
我们通过要求正常人和阈下抑郁症参与者阅读特定选择的单词和文本,收集他们的语音数据。从每个语音样本中提取384维的声音情感特征变量,包括能量特征、梅尔频率倒谱系数、过零率特征、音素概率特征、基频特征、差值特征。采用递归特征消除(RFE)方法选择语音特征变量。然后使用自适应增强(AdaBoost)、随机森林(RF)、线性判别分析(LDA)、逻辑回归(LR)、套索回归(LRLasso)和支持向量机(SVM)等机器学习算法构建分类模型,并评估这些模型的性能。为了评估模型的泛化能力,我们使用实际语音数据来评估最佳语音识别分类模型。
AdaBoost、RF和LDA模型在单词阅读语音测试集上分别达到了100%、100%和93.3%的高预测准确率。在文本阅读语音测试集中,AdaBoost、RF和LDA模型的准确率分别为90%、80%和90%,而其他3个模型的准确率均低于80%。在实际的单词阅读和文本阅读语音数据上,使用AdaBoost和随机森林的分类模型仍具有较高的预测准确率(AdaBoost分别为91.7%和80.6%,随机森林分别为86.1%和77.8%)。
分析声音情感特征能够有效识别阈下抑郁症患者。AdaBoost和RF模型在分类阈下抑郁症患者方面表现出优异性能,因此可能在临床和研究环境中提供有价值的帮助。