Vázquez-Romero Adrián, Gallardo-Antolín Ascensión
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, Leganés, 28911 Madrid, Spain.
Entropy (Basel). 2020 Jun 20;22(6):688. doi: 10.3390/e22060688.
This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depression Classification Sub-Challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC-2016). In the pre-processing phase, speech files are represented as a sequence of log-spectrograms and randomly sampled to balance positive and negative samples. For the classification task itself, first, a more suitable architecture for this task, based on One-Dimensional Convolutional Neural Networks, is built. Secondly, several of these CNN-based models are trained with different initializations and then the corresponding individual predictions are fused by using an Ensemble Averaging algorithm and combined per speaker to get an appropriate final decision. The proposed ensemble system achieves satisfactory results on the DCC at the AVEC-2016 in comparison with a reference system based on Support Vector Machines and hand-crafted features, with a CNN+LSTM-based system called DepAudionet, and with the case of a single CNN-based classifier.
本文提出了一种基于语音的自动抑郁症分类方法。该系统基于卷积神经网络(CNN)的集成学习,并使用2016年视听情感挑战赛(AVEC - 2016)抑郁症分类子挑战赛(DCC)中提供的数据和实验协议进行评估。在预处理阶段,语音文件被表示为对数频谱图序列,并进行随机采样以平衡正样本和负样本。对于分类任务本身,首先,基于一维卷积神经网络构建了一个更适合此任务的架构。其次,使用不同的初始化对多个基于CNN的模型进行训练,然后通过集成平均算法融合相应的个体预测,并按每个说话者进行组合以获得合适的最终决策。与基于支持向量机和手工特征的参考系统、一个名为DepAudionet的基于CNN + LSTM的系统以及单个基于CNN的分类器的情况相比,所提出的集成系统在AVEC - 2016的DCC上取得了令人满意的结果。