• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无意义语音中的情感感知与分类:人类与机器。

Perception and classification of emotions in nonsense speech: Humans versus machines.

机构信息

Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria.

Human-centered AI Group, Linz Institute of Technology (LIT), Linz, Austria.

出版信息

PLoS One. 2023 Jan 30;18(1):e0281079. doi: 10.1371/journal.pone.0281079. eCollection 2023.

DOI:10.1371/journal.pone.0281079
PMID:36716307
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9886254/
Abstract

This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.

摘要

本文通过解决传统情感计算中存在的四个谬误,为更充分地模拟语音中的情感提供了贡献:首先,研究集中在少数几种情感上,而忽略了其他所有情感(“封闭世界”)。其次,研究使用干净(实验室)数据或真实生活数据,但不在可比环境中比较干净和嘈杂的数据(“干净世界”)。第三,机器学习方法需要大量数据;然而,它们的性能尚未通过系统比较不同方法和不同大小的数据库来评估(“小世界”)。第四,尽管人类对情感的注释构成了自动分类的基础,但人类感知和机器分类尚未在严格的基础上进行比较(“一个世界”)。最后,我们通过解释类别之间的混淆来处理情感的内在模糊性(“模糊世界”)。我们使用 GEMEP 语料库中的表演性无意义语音、作为测试集中不包含的类别“干扰项”的情感“分心物”、掩盖清晰录音的真实生活噪音,以及不同大小的机器学习训练集。我们表明,基于最先进的特征表示(wav2vec2)的机器学习能够反映感知情感星座中存在的主要情感类别(“支柱”),即使在降级的声学条件下也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/584d1a8bb525/pone.0281079.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/e4252b5d88bd/pone.0281079.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/8954a1ab0851/pone.0281079.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/29543e7d0ed5/pone.0281079.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/aa5474664c56/pone.0281079.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/44d208d056fa/pone.0281079.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/91f69a44bf05/pone.0281079.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/33ab119eed58/pone.0281079.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/584d1a8bb525/pone.0281079.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/e4252b5d88bd/pone.0281079.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/8954a1ab0851/pone.0281079.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/29543e7d0ed5/pone.0281079.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/aa5474664c56/pone.0281079.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/44d208d056fa/pone.0281079.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/91f69a44bf05/pone.0281079.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/33ab119eed58/pone.0281079.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/584d1a8bb525/pone.0281079.g008.jpg

相似文献

1
Perception and classification of emotions in nonsense speech: Humans versus machines.无意义语音中的情感感知与分类:人类与机器。
PLoS One. 2023 Jan 30;18(1):e0281079. doi: 10.1371/journal.pone.0281079. eCollection 2023.
2
Comparing Manual and Machine Annotations of Emotions in Non-acted Speech.非表演性言语中情感的人工标注与机器标注比较
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:4241-4244. doi: 10.1109/EMBC.2018.8513230.
3
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
4
Robust Multi-Scenario Speech-Based Emotion Recognition System.基于语音的稳健多场景情绪识别系统。
Sensors (Basel). 2022 Mar 18;22(6):2343. doi: 10.3390/s22062343.
5
Emotions in [a]: a perceptual and acoustic study.关于[a]中的情感:一项感知与声学研究。
Logoped Phoniatr Vocol. 2006;31(1):43-8. doi: 10.1080/14015430500293926.
6
Emotional speech acoustic model for Malay: iterative versus isolated unit training.马来语情感语音声学模型:迭代与孤立单元训练。
J Acoust Soc Am. 2013 Oct;134(4):3057-66. doi: 10.1121/1.4818741.
7
Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis.探索用于共情语音合成的次要情感的韵律特征建模。
Sensors (Basel). 2023 Mar 10;23(6):2999. doi: 10.3390/s23062999.
8
Strength Is in Numbers: Can Concordant Artificial Listeners Improve Prediction of Emotion from Speech?人多力量大:意见一致的人工听众能否提高从语音中预测情绪的能力?
PLoS One. 2016 Aug 26;11(8):e0161752. doi: 10.1371/journal.pone.0161752. eCollection 2016.
9
The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures.在两种文化中,言语韵律的 12 种情感识别中,范畴的首要地位。
Nat Hum Behav. 2019 Apr;3(4):369-382. doi: 10.1038/s41562-019-0533-6. Epub 2019 Mar 11.
10
Shared acoustic codes underlie emotional communication in music and speech-Evidence from deep transfer learning.共享声学编码是音乐和言语中情感交流的基础——来自深度迁移学习的证据。
PLoS One. 2017 Jun 28;12(6):e0179289. doi: 10.1371/journal.pone.0179289. eCollection 2017.

引用本文的文献

1
Exploring emotions in Bach chorales: a multi-modal perceptual and data-driven study.探索巴赫众赞歌中的情感:一项多模态感知与数据驱动的研究。
R Soc Open Sci. 2023 Dec 20;10(12):230574. doi: 10.1098/rsos.230574. eCollection 2023 Dec.

本文引用的文献

1
Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap.语音情感识别的变革时代黎明:弥合效价鸿沟。
IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10745-10759. doi: 10.1109/TPAMI.2023.3263585. Epub 2023 Aug 7.
2
The perception of emotional cues by children in artificial background noise.儿童在人工背景噪音中对情绪线索的感知。
Int J Speech Technol. 2020;23(1):169-182. doi: 10.1007/s10772-020-09675-1. Epub 2020 Jan 22.
3
The paradoxical role of emotional intensity in the perception of vocal affect.
情绪强度在感知声音情感中的矛盾作用。
Sci Rep. 2021 May 6;11(1):9663. doi: 10.1038/s41598-021-88431-0.
4
Semantic segmentation of HeLa cells: An objective comparison between one traditional algorithm and four deep-learning architectures.HeLa 细胞的语义分割:一种传统算法与四种深度学习架构的客观比较。
PLoS One. 2020 Oct 2;15(10):e0230605. doi: 10.1371/journal.pone.0230605. eCollection 2020.
5
The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures.在两种文化中,言语韵律的 12 种情感识别中,范畴的首要地位。
Nat Hum Behav. 2019 Apr;3(4):369-382. doi: 10.1038/s41562-019-0533-6. Epub 2019 Mar 11.
6
Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.跨文化情绪韵律识别:来自中国和英国听众的证据。
Cogn Emot. 2014;28(2):230-44. doi: 10.1080/02699931.2013.812033. Epub 2013 Jul 17.
7
Noise pollution: a modem plague.噪音污染:一种现代瘟疫。
South Med J. 2007 Mar;100(3):287-94. doi: 10.1097/smj.0b013e3180318be5.
8
Acoustic profiles in vocal emotion expression.声乐情感表达中的声学特征。
J Pers Soc Psychol. 1996 Mar;70(3):614-36. doi: 10.1037//0022-3514.70.3.614.
9
What's basic about basic emotions?基本情绪的基本之处是什么?
Psychol Rev. 1990 Jul;97(3):315-31. doi: 10.1037/0033-295x.97.3.315.