联合对齐与预测连续情感注释

Jointly Aligning and Predicting Continuous Emotion Annotations.

作者信息

Khorram Soheil, McInnis Melvin G, Mower Provost Emily

机构信息

Research Fellow in the Departments of Computer Science and Engineering (College of Engineering) and Psychiatry (School of Medicine), University of Michigan.

Thomas B and Nancy Upjohn Woodworth Professor of Bipolar Disorder and Depression, Department of Psychiatry, University of Michigan School of Medicine.

出版信息

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1069-1083. doi: 10.1109/taffc.2019.2917047. Epub 2019 May 16.

DOI:10.1109/taffc.2019.2917047

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9205566/

Abstract

Time-continuous dimensional descriptions of emotions (e.g., arousal, valence) allow researchers to characterize short-time changes and to capture long-term trends in emotion expression. However, continuous emotion labels are generally not synchronized with the input speech signal due to delays caused by reaction-time, which is inherent in human evaluations. To deal with this challenge, we introduce a new convolutional neural network () that is able to simultaneously align and predict labels in an end-to-end manner. The proposed network is a stack of convolutional layers followed by an aligner network that aligns the speech signal and emotion labels. This network is implemented using a new convolutional layer that we introduce, the . It is a time-shifted low-pass (sinc) filter that uses a gradient-based algorithm to learn a single delay. Multiple delayed sinc layers can be used to compensate for a non-stationary delay that is a function of the acoustic space. We test the efficacy of this system on two common emotion datasets, RECOLA and SEWA, and show that this approach obtains state-of-the-art speech-only results by learning time-varying delays while predicting dimensional descriptors of emotions.

摘要

对情绪进行时间连续的维度描述（例如，唤醒度、效价），使研究人员能够刻画情绪表达的短期变化并捕捉长期趋势。然而，由于人类评估中固有的反应时间导致的延迟，连续的情绪标签通常与输入语音信号不同步。为应对这一挑战，我们引入了一种新型卷积神经网络（），它能够以端到端的方式同时对齐和预测标签。所提出的网络是一系列卷积层，后面跟着一个对齐器网络，该对齐器网络对齐语音信号和情绪标签。这个网络是使用我们引入的一种新型卷积层实现的，即。它是一个时移低通（ sinc ）滤波器，使用基于梯度的算法来学习单个延迟。多个延迟的 sinc 层可用于补偿作为声学空间函数的非平稳延迟。我们在两个常见的情绪数据集RECOLA和SEWA上测试了该系统的有效性，并表明这种方法通过在预测情绪维度描述符时学习时变延迟获得了仅语音的最新结果。

相似文献

1

Jointly Aligning and Predicting Continuous Emotion Annotations.联合对齐与预测连续情感注释

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1069-1083. doi: 10.1109/taffc.2019.2917047. Epub 2019 May 16.

2

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.

3

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech.基于 LSTM 的语音维度情感识别的多分辨率调制滤波耳蜗图特征。

Neural Netw. 2021 Aug;140:261-273. doi: 10.1016/j.neunet.2021.03.027. Epub 2021 Mar 25.

4

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。

PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.

5

Interpretable and lightweight convolutional neural network for EEG decoding: Application to movement execution and imagination.可解释且轻量级的卷积神经网络在 EEG 解码中的应用：在运动执行和想象中的应用。

Neural Netw. 2020 Sep;129:55-74. doi: 10.1016/j.neunet.2020.05.032. Epub 2020 May 29.

6

An improved multi-input deep convolutional neural network for automatic emotion recognition.一种用于自动情感识别的改进型多输入深度卷积神经网络。

Front Neurosci. 2022 Oct 4;16:965871. doi: 10.3389/fnins.2022.965871. eCollection 2022.

7

Emotion Recognition Using Electrodermal Activity Signals and Multiscale Deep Convolutional Neural Network.使用皮肤电活动信号和多尺度深度卷积神经网络进行情绪识别。

J Med Syst. 2021 Mar 4;45(4):49. doi: 10.1007/s10916-020-01676-6.

8

Multi-channel EEG-based emotion recognition via a multi-level features guided capsule network.基于多通道 EEG 的多水平特征引导胶囊网络情绪识别。

Comput Biol Med. 2020 Aug;123:103927. doi: 10.1016/j.compbiomed.2020.103927. Epub 2020 Jul 22.

9

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.基于梅尔频谱图和 GeMAPS 的多输入语音情感识别模型。

Sensors (Basel). 2023 Feb 3;23(3):1743. doi: 10.3390/s23031743.

10

Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。

Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.

引用本文的文献

1

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients.基于卷积神经网络的人工耳蜗植入者语音增强技术

Interspeech. 2019 Sep;2019:4265-4269. doi: 10.21437/interspeech.2019-1850.

本文引用的文献

1

Universals and cultural variations in 22 emotional expressions across five cultures.五种文化中 22 种情感表达的普遍性和文化差异。

Emotion. 2018 Feb;18(1):75-93. doi: 10.1037/emo0000302. Epub 2017 Jun 12.

2

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet：一种用于图像分割的深度卷积编解码器架构。

IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.

3

Age and sex differences in reaction time in adulthood: results from the United Kingdom Health and Lifestyle Survey.成年期反应时间的年龄和性别差异：来自英国健康与生活方式调查的结果。

Psychol Aging. 2006 Mar;21(1):62-73. doi: 10.1037/0882-7974.21.1.62.

4

On the speed of different senses and nerve transmission by Hirsch (1862).关于不同感官的速度以及赫希（1862年）所研究的神经传导

Psychol Res. 1997;59(4):261-8. doi: 10.1007/BF00439303.

5

Errors of judgement at Greenwich in 1796.1796年格林威治的判断失误。

Nature. 1996 Mar 14;380(6570):101-2. doi: 10.1038/380101a0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验