Speech and Sound Processing Lab, Department of Electrical Engineering, K.N. Toosi University of Technology, Tehran, Iran.
Comput Biol Med. 2022 Oct;149:105926. doi: 10.1016/j.compbiomed.2022.105926. Epub 2022 Aug 6.
This study proposes depression detection systems based on the i-vector framework for classifying speakers as depressed or healthy and predicting depression levels according to the Beck Depression Inventory-II (BDI-II). Linear and non-linear speech features are investigated as front-end features to i-vectors. To take advantage of the complementary effects of features, i-vector systems based on linear and non-linear features are combined through the decision-level fusion. Variability compensation techniques, such as Linear Discriminant Analysis (LDA) and Within-Class Covariance Normalization (WCCN), are widely used to reduce unwanted variabilities. A more generalizable technique than the LDA is required when limited training data are available. We employ a support vector discriminant analysis (SVDA) technique that uses the boundary of classes to find discriminatory directions to address this problem. Experiments conducted on the 2014 Audio-Visual Emotion Challenge and Workshop (AVEC 2014) depression database indicate that the best accuracy improvement obtained using SVDA is about 15.15% compared to the uncompensated i-vectors. In all cases, experimental results confirm that the decision-level fusion of i-vector systems based on three feature sets, TEO-CB-Auto-Env+Δ, Glottal+Δ, and MFCC+Δ+ΔΔ, achieves the best results. This fusion significantly improves classifying results, yielding an accuracy of 90%. The combination of SVDA-transformed BDI-II score prediction systems based on these three feature sets achieved RMSE and MAE of 8.899 and 6.991, respectively, which means 29.18% and 30.34% improvements in RMSE and MAE, respectively, over the baseline system on the test partition. Furthermore, this proposed combination outperforms other audio-based studies available in the literature using the AVEC 2014 database.
本研究提出了基于 i-vector 框架的抑郁检测系统,用于根据贝克抑郁量表第二版(BDI-II)将说话者分为抑郁或健康,并预测抑郁程度。线性和非线性语音特征被作为前端特征来研究 i-vector。为了利用特征的互补效应,通过决策级融合将基于线性和非线性特征的 i-vector 系统结合起来。线性判别分析(LDA)和类内协方差归一化(WCCN)等变异性补偿技术被广泛用于减少不必要的变异性。当可用的训练数据有限时,需要一种比 LDA 更具通用性的技术。我们采用支持向量判别分析(SVDA)技术,该技术使用类的边界来找到区分方向,以解决这个问题。在 2014 年音频-视觉情感挑战赛和研讨会(AVEC 2014)抑郁数据库上进行的实验表明,与未补偿的 i-vectors 相比,使用 SVDA 获得的最佳准确性提高约为 15.15%。在所有情况下,实验结果均证实,基于三个特征集(TEO-CB-Auto-Env+Δ、Glottal+Δ 和 MFCC+Δ+ΔΔ)的 i-vector 系统的决策级融合可获得最佳结果。这种融合显著提高了分类结果,准确率达到 90%。基于这三个特征集的 SVDA 转换后的 BDI-II 评分预测系统的组合,在测试分区上,RMSE 和 MAE 分别达到 8.899 和 6.991,这意味着 RMSE 和 MAE 分别提高了 29.18%和 30.34%。此外,与使用 AVEC 2014 数据库的文献中的其他基于音频的研究相比,这种组合表现更好。