Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan.
College of Information and Systems, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan.
Int J Environ Res Public Health. 2023 Jan 15;20(2):1588. doi: 10.3390/ijerph20021588.
Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.
音频特征是反映发声器官单一或复杂协调运动的物理特征。因此,在基于语音的自动抑郁分类中,考虑音频特征之间的关系至关重要。在这里,我们提出了一种基于深度学习的分类模型,用于使用音频特征之间的相关性来区分抑郁及其严重程度。该模型将音频特征之间的相关性表示为图结构,并使用图卷积神经网络学习语音特征。我们进行了分类实验,其中允许相同的受试者同时包含在训练和测试数据中(设置 1),并且训练和测试数据中的受试者完全分开(设置 2)。结果表明,设置 1 中的分类准确率明显优于现有最先进的方法,而在现有研究中尚未提出的设置 2 中的准确率要低得多。我们得出结论,所提出的模型是区分复发性患者及其严重程度的有效工具,但很难检测到新的抑郁患者。对于模型的实际应用,应该检测出现局部而非抑郁患者整个语音的抑郁特异性语音区域,并为其分配适当的类别标签。