比较背景噪声中的情绪识别与词汇识别

Comparing Emotion Recognition and Word Recognition in Background Noise.

机构信息

Department of Otolaryngology - Head and Neck Surgery and Communicative Disorders, University of Louisville, KY.

出版信息

J Speech Lang Hear Res. 2021 May 11;64(5):1758-1772. doi: 10.1044/2021_JSLHR-20-00153. Epub 2021 Apr 8.

DOI:10.1044/2021_JSLHR-20-00153

Abstract

Purpose Word recognition in quiet and in background noise has been thoroughly investigated in previous research to establish segmental speech recognition performance as a function of stimulus characteristics (e.g., audibility). Similar methods to investigate recognition performance for suprasegmental information (e.g., acoustic cues used to make judgments of talker age, sex, or emotional state) have not been performed. In this work, we directly compared emotion and word recognition performance in different levels of background noise to identify psychoacoustic properties of emotion recognition (globally and for specific emotion categories) relative to word recognition. Method Twenty young adult listeners with normal hearing listened to sentences and either reported a target word in each sentence or selected the emotion of the talker from a list of options (angry, calm, happy, and sad) at four signal-to-noise ratios in a background of white noise. Psychometric functions were fit to the recognition data and used to estimate thresholds (midway points on the function) and slopes for word and emotion recognition. Results Thresholds for emotion recognition were approximately 10 dB better than word recognition thresholds, and slopes for emotion recognition were half of those measured for word recognition. Low-arousal emotions had poorer thresholds and shallower slopes than high-arousal emotions, suggesting greater confusion when distinguishing low-arousal emotional speech content. Conclusions Communication of a talker's emotional state continues to be perceptible to listeners in competitive listening environments, even after words are rendered inaudible. The arousal of emotional speech affects listeners' ability to discriminate between emotion categories.

摘要

目的在以往的研究中，人们已经对安静环境和背景噪声中的单词识别进行了深入研究，以确定作为刺激特征（例如可听度）函数的分段语音识别性能。尚未采用类似的方法来研究超音段信息（例如，用于判断说话人年龄、性别或情绪状态的听觉线索）的识别性能。在这项工作中，我们直接比较了不同背景噪声水平下的情绪和单词识别性能，以确定相对于单词识别，情绪识别的心理声学特性（整体和特定情绪类别）。方法 20 名听力正常的年轻成年听众在白噪声背景下以四个信噪比聆听句子，并在句子中报告目标单词，或者从愤怒、平静、高兴和悲伤四个选项中选择说话人的情绪。将心理物理函数拟合到识别数据中，以估计单词和情绪识别的阈值（函数的中点）和斜率。结果情绪识别的阈值比单词识别的阈值好约 10dB，情绪识别的斜率是单词识别斜率的一半。低唤醒度的情绪的阈值较差，斜率较浅，这表明在区分低唤醒度的情绪语音内容时存在更大的混淆。结论在竞争的听力环境中，即使在单词变得不可听的情况下，听众仍然能够感知说话人情绪状态的传达。情绪语音的唤醒度会影响听众区分情绪类别的能力。