Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3409-3413. doi: 10.1109/EMBC48229.2022.9871556.
A growing area of mental health research pertains to how an individual's degree of depression might be automatically assessed through analyzing multimodal-based objective markers. However, when combined with machine learning, this research can be challenging due to the existence of unaligned multimodal sequences and the limited amount of annotated training data. In this paper, a novel cross-modal framework for automatic depression severity assessment is proposed. The low-level descriptions (LLDs) from multiple clues (such as text, audio and video) are extracted, after which multimodal fusion via cross-modal attention mechanism is utilized to facilitate the learning of more accurate feature representations. For the features extracted from each modality, the cross-modal attention mechanism is utilized to continuously update the input sequence of the target mode, until the score of the patient's health questionnaire (PHQ-8) can finally be obtained. Moreover, Self-Attention Generative Adversarial Networks (SAGAN) is employed to increase the amount of training data available for depression severity analysis. Experimental results on the depression sub-challenge dataset of the Audio/Visual Emotion Challenge (AVEC 2017 and AVEC 2019) demonstrate the effectiveness of our proposed method.
心理健康研究的一个不断发展的领域涉及如何通过分析基于多模态的客观标记物自动评估个体的抑郁程度。然而,当与机器学习结合时,由于存在未对齐的多模态序列和有限数量的注释训练数据,这项研究具有挑战性。本文提出了一种用于自动抑郁严重程度评估的新的跨模态框架。从多个线索(如文本、音频和视频)中提取低级描述(LLD),然后利用跨模态注意力机制进行多模态融合,以促进更准确特征表示的学习。对于从每个模态提取的特征,利用跨模态注意力机制不断更新目标模态的输入序列,直到最终获得患者健康问卷(PHQ-8)的分数。此外,还采用自注意生成对抗网络(SAGAN)来增加用于抑郁严重程度分析的训练数据量。在音频/视觉情感挑战(AVEC 2017 和 AVEC 2019)的抑郁子挑战数据集上的实验结果证明了我们提出的方法的有效性。