Suppr超能文献

无意义语音中的情感感知与分类:人类与机器。

Perception and classification of emotions in nonsense speech: Humans versus machines.

机构信息

Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria.

Human-centered AI Group, Linz Institute of Technology (LIT), Linz, Austria.

出版信息

PLoS One. 2023 Jan 30;18(1):e0281079. doi: 10.1371/journal.pone.0281079. eCollection 2023.

Abstract

This article contributes to a more adequate modelling of emotions encoded in speech, by addressing four fallacies prevalent in traditional affective computing: First, studies concentrate on few emotions and disregard all other ones ('closed world'). Second, studies use clean (lab) data or real-life ones but do not compare clean and noisy data in a comparable setting ('clean world'). Third, machine learning approaches need large amounts of data; however, their performance has not yet been assessed by systematically comparing different approaches and different sizes of databases ('small world'). Fourth, although human annotations of emotion constitute the basis for automatic classification, human perception and machine classification have not yet been compared on a strict basis ('one world'). Finally, we deal with the intrinsic ambiguities of emotions by interpreting the confusions between categories ('fuzzy world'). We use acted nonsense speech from the GEMEP corpus, emotional 'distractors' as categories not entailed in the test set, real-life noises that mask the clear recordings, and different sizes of the training set for machine learning. We show that machine learning based on state-of-the-art feature representations (wav2vec2) is able to mirror the main emotional categories ('pillars') present in perceptual emotional constellations even in degradated acoustic conditions.

摘要

本文通过解决传统情感计算中存在的四个谬误,为更充分地模拟语音中的情感提供了贡献:首先,研究集中在少数几种情感上,而忽略了其他所有情感(“封闭世界”)。其次,研究使用干净(实验室)数据或真实生活数据,但不在可比环境中比较干净和嘈杂的数据(“干净世界”)。第三,机器学习方法需要大量数据;然而,它们的性能尚未通过系统比较不同方法和不同大小的数据库来评估(“小世界”)。第四,尽管人类对情感的注释构成了自动分类的基础,但人类感知和机器分类尚未在严格的基础上进行比较(“一个世界”)。最后,我们通过解释类别之间的混淆来处理情感的内在模糊性(“模糊世界”)。我们使用 GEMEP 语料库中的表演性无意义语音、作为测试集中不包含的类别“干扰项”的情感“分心物”、掩盖清晰录音的真实生活噪音,以及不同大小的机器学习训练集。我们表明,基于最先进的特征表示(wav2vec2)的机器学习能够反映感知情感星座中存在的主要情感类别(“支柱”),即使在降级的声学条件下也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4eb/9886254/e4252b5d88bd/pone.0281079.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验