Suppr超能文献

音频情感声学:言语、音乐和声音的共同之处

On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

机构信息

Machine Intelligence and Signal Processing Group, Mensch-Maschine-Kommunikation, Technische Universität München , Munich , Germany.

出版信息

Front Psychol. 2013 May 27;4:292. doi: 10.3389/fpsyg.2013.00292. eCollection 2013.

Abstract

WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

摘要

毫无疑问,人类每天接收到的几乎任何类型的声音中都包含情感信息:无论是通过言语传递的人的情感状态;作曲家在创作音乐作品时所表达的情感,或演奏家在演奏时所传达的情感;还是与环境中发生的声事件、电影配乐或广播剧中的情感状态有关的情感。在情感计算领域,目前有一些关于这些现象的松散相关研究,但仍然缺乏声音情感的整体计算模型。反过来,对于明天普及的技术系统,包括情感伴侣和机器人,了解“某物发出的声音”的情感维度,以便评估系统的听觉环境及其自身的音频输出,预计将是非常有益的。本文旨在朝着整体计算模型迈出第一步:从言语、音乐和声音分析领域的标准声学特征提取方案开始,我们在这三个领域中解释了各个特征的价值,考虑了四个带有观察者在唤醒度和愉悦度维度上注释的音频数据库。在结果中,我们发现通过选择适当的描述符,可以进行跨域的唤醒度和愉悦度回归,与观察者注释的相关性高达 0.78(在声音上进行训练,在演讲中进行测试)和 0.60(在演讲上进行训练,在音乐上进行测试)。编码情感的两个主要维度的高度跨域一致性可能归因于言语和音乐从多模态情感爆发中共同演变,包括将自然声音整合到表达效果中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c78c/3664314/f21f13c9b453/fpsyg-04-00292-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验