Suppr超能文献

共享声学编码是音乐和言语中情感交流的基础——来自深度迁移学习的证据。

Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.

作者信息

Coutinho Eduardo, Schuller Björn

机构信息

Department of Music, University of Liverpool, Liverpool, United Kingdom.

Department of Computing, Imperial College London, London, United Kingdom.

出版信息

PLoS One. 2017 Jun 28;12(6):e0179289. doi: 10.1371/journal.pone.0179289. eCollection 2017.

Abstract

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.

摘要

音乐和言语在声学领域的情感交流方面表现出惊人的相似性,即特定情感的交流至少在一定程度上是通过共享的声学模式来实现的。从情感科学的角度来看,确定这两个领域之间的重叠程度对于理解这种现象背后的共享机制至关重要。从机器学习的角度来看,音乐和言语中情感表达的声学代码之间的重叠为扩大用于开发音乐和言语情感识别系统的数据量开辟了新的可能性。在本文中,我们研究了音乐和言语中情感(唤醒和效价)的时间连续预测,以及这两个领域之间的迁移学习。我们建立了一个比较框架,包括域内(即,在相同模态上训练和测试的模型,无论是音乐还是言语)和跨域实验(即,在一种模态上训练并在另一种模态上测试的模型)。在跨域背景下,我们评估了两种策略——域之间的直接迁移,以及迁移学习技术(基于去噪自动编码器的特征表示迁移)对缩小特征空间分布差距的贡献。我们的结果表明,无论是否进行双向特征表示迁移,都具有出色的跨域泛化性能。在音乐方面,跨域方法在效价估计方面优于域内模型,而对于言语,域内模型表现最佳。这是在时间连续域中音乐和言语情感表达共享声学代码的首次证明。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adea/5489171/f53b6aa6b71c/pone.0179289.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验