音频情感声学：言语、音乐和声音的共同之处

On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

机构信息

Machine Intelligence and Signal Processing Group, Mensch-Maschine-Kommunikation, Technische Universität München , Munich , Germany.

出版信息

Front Psychol. 2013 May 27;4:292. doi: 10.3389/fpsyg.2013.00292. eCollection 2013.

DOI:10.3389/fpsyg.2013.00292

PMID:23750144

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3664314/

Abstract

WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

摘要

毫无疑问，人类每天接收到的几乎任何类型的声音中都包含情感信息：无论是通过言语传递的人的情感状态；作曲家在创作音乐作品时所表达的情感，或演奏家在演奏时所传达的情感；还是与环境中发生的声事件、电影配乐或广播剧中的情感状态有关的情感。在情感计算领域，目前有一些关于这些现象的松散相关研究，但仍然缺乏声音情感的整体计算模型。反过来，对于明天普及的技术系统，包括情感伴侣和机器人，了解“某物发出的声音”的情感维度，以便评估系统的听觉环境及其自身的音频输出，预计将是非常有益的。本文旨在朝着整体计算模型迈出第一步：从言语、音乐和声音分析领域的标准声学特征提取方案开始，我们在这三个领域中解释了各个特征的价值，考虑了四个带有观察者在唤醒度和愉悦度维度上注释的音频数据库。在结果中，我们发现通过选择适当的描述符，可以进行跨域的唤醒度和愉悦度回归，与观察者注释的相关性高达 0.78（在声音上进行训练，在演讲中进行测试）和 0.60（在演讲上进行训练，在音乐上进行测试）。编码情感的两个主要维度的高度跨域一致性可能归因于言语和音乐从多模态情感爆发中共同演变，包括将自然声音整合到表达效果中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c78c/3664314/f21f13c9b453/fpsyg-04-00292-g001.jpg

相似文献

On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.音频情感声学：言语、音乐和声音的共同之处

Front Psychol. 2013 May 27;4:292. doi: 10.3389/fpsyg.2013.00292. eCollection 2013.

Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.共享声学编码是音乐和言语中情感交流的基础——来自深度迁移学习的证据。

PLoS One. 2017 Jun 28;12(6):e0179289. doi: 10.1371/journal.pone.0179289. eCollection 2017.

Perception and Modeling of Affective Qualities of Musical Instrument Sounds across Pitch Registers.跨音高范围对乐器声音情感特质的感知与建模

Front Psychol. 2017 Feb 8;8:153. doi: 10.3389/fpsyg.2017.00153. eCollection 2017.

Do Individual Differences Influence Moment-by-Moment Reports of Emotion Perceived in Music and Speech Prosody?个体差异是否会影响对音乐和言语韵律中所感知到的情绪的即时报告？

Front Behav Neurosci. 2018 Aug 27;12:184. doi: 10.3389/fnbeh.2018.00184. eCollection 2018.

Processing emotions in sounds: cross-domain aftereffects of vocal utterances and musical sounds.声音中的情绪处理：语音和音乐声音的跨领域后效应

Cogn Emot. 2017 Dec;31(8):1610-1626. doi: 10.1080/02699931.2016.1255588. Epub 2016 Nov 16.

Effects of musical expertise on oscillatory brain activity in response to emotional sounds.音乐专长对响应情感声音时大脑振荡活动的影响。

Neuropsychologia. 2017 Aug;103:96-105. doi: 10.1016/j.neuropsychologia.2017.07.014. Epub 2017 Jul 15.

The family oriented musical training for children with cochlear implants: speech and musical perception results of two year follow-up.面向家庭的人工耳蜗植入儿童音乐训练：两年随访的言语和音乐感知结果

Int J Pediatr Otorhinolaryngol. 2009 Jul;73(7):1043-52. doi: 10.1016/j.ijporl.2009.04.009. Epub 2009 May 2.

Sounds of emotion: production and perception of affect-related vocal acoustics.情感之声：与情感相关的嗓音声学特征的产生与感知

Ann N Y Acad Sci. 2003 Dec;1000:244-65. doi: 10.1196/annals.1280.012.

What Constitutes a Phrase in Sound-Based Music? A Mixed-Methods Investigation of Perception and Acoustics.基于声音的音乐中的短语由什么构成？一项关于感知与声学的混合方法研究。

PLoS One. 2016 Dec 20;11(12):e0167643. doi: 10.1371/journal.pone.0167643. eCollection 2016.

Preconceptual Spectral and Temporal Cues as a Source of Meaning in Speech and Music.孕前光谱和时间线索作为言语和音乐中意义的来源。

Brain Sci. 2019 Mar 1;9(3):53. doi: 10.3390/brainsci9030053.

引用本文的文献

A software pipeline for systematizing machine learning of speech data.一种用于语音数据机器学习系统化的软件流程。

Front Psychiatry. 2025 Jul 29;16:1451368. doi: 10.3389/fpsyt.2025.1451368. eCollection 2025.

Effects of stimulus amplitude-scaling approach on emotional responses to non-speech sounds.刺激幅度缩放方法对非语音声音情绪反应的影响。

PLoS One. 2025 Jul 31;20(7):e0328659. doi: 10.1371/journal.pone.0328659. eCollection 2025.

Exploring the impact of noise, language familiarity, and experimental settings on emotion recognition.探究噪音、语言熟悉度和实验环境对情绪识别的影响。

Front Psychol. 2025 Jun 25;16:1548975. doi: 10.3389/fpsyg.2025.1548975. eCollection 2025.

Emotional impact of AI-generated vs. human-composed music in audiovisual media: A biometric and self-report study.人工智能生成的音乐与人类创作的音乐在视听媒体中的情感影响：一项生物特征与自我报告研究。

PLoS One. 2025 Jun 25;20(6):e0326498. doi: 10.1371/journal.pone.0326498. eCollection 2025.

Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study.揭示嗓音音高与重度抑郁症之间的关联：一项多中心基因研究。

Mol Psychiatry. 2025 Jun;30(6):2686-2695. doi: 10.1038/s41380-024-02877-y. Epub 2024 Dec 31.

Unraveling the Associations Between Voice Pitch and Major Depressive Disorder: A Multisite Genetic Study.揭示嗓音音高与重度抑郁症之间的关联：一项多中心基因研究。

medRxiv. 2024 Oct 13:2024.10.12.24315366. doi: 10.1101/2024.10.12.24315366.

Fusion of PCA and ICA in Statistical Subset Analysis for Speech Emotion Recognition.主成分分析和独立成分分析在语音情感识别统计子集分析中的融合。

Sensors (Basel). 2024 Sep 2;24(17):5704. doi: 10.3390/s24175704.

Detection of Mild Cognitive Impairment From Non-Semantic, Acoustic Voice Features: The Framingham Heart Study.从非语义、声学语音特征检测轻度认知障碍：弗雷明汉心脏研究。

JMIR Aging. 2024 Aug 22;7:e55126. doi: 10.2196/55126.

Cochlear-implant simulated spectral degradation attenuates emotional responses to environmental sounds.人工耳蜗模拟的频谱退化会减弱对环境声音的情绪反应。

Int J Audiol. 2025 May;64(5):518-524. doi: 10.1080/14992027.2024.2385552. Epub 2024 Aug 15.

Association Between Acoustic Features and Brain Volumes: the Framingham Heart Study.声学特征与脑容量之间的关联：弗雷明汉心脏研究

Front Dement. 2023;2. doi: 10.3389/frdem.2023.1214940. Epub 2023 Nov 23.

本文引用的文献

Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception.介绍用于情绪感知实验研究的日内瓦多模态表达语料库。

Emotion. 2012 Oct;12(5):1161-79. doi: 10.1037/a0025827. Epub 2011 Nov 14.

The world of emotions is not two-dimensional.情感世界并非二维的。

Psychol Sci. 2007 Dec;18(12):1050-7. doi: 10.1111/j.1467-9280.2007.02024.x.

Decoding speech prosody: do music lessons help?解读言语韵律：音乐课有帮助吗？

Emotion. 2004 Mar;4(1):46-64. doi: 10.1037/1528-3542.4.1.46.

Communication of emotions in vocal expression and music performance: different channels, same code?声乐表达和音乐表演中的情感传达：不同渠道，相同编码？

Psychol Bull. 2003 Sep;129(5):770-814. doi: 10.1037/0033-2909.129.5.770.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

音频情感声学：言语、音乐和声音的共同之处

On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献