模拟基于说话者和听者的嘈杂语音诱导辅音混淆的变异性来源。

Modeling talker- and listener-based sources of variability in babble-induced consonant confusions.

作者信息

Silbert Noah H, Motlagh Zadeh Lina

机构信息

Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio, 45267, USA.

出版信息

J Acoust Soc Am. 2018 May;143(5):2780. doi: 10.1121/1.5037091.

DOI:10.1121/1.5037091

PMID:29857734

Abstract

Speech communication often occurs in the presence of noise. Patterns of perceptual errors induced by background noise are influenced by properties of the listener and of the noise and target speech. The present study introduces a modification of multilevel general recognition theory in which talker- and listener-based variability in confusion patterns are modeled as global or dimension-specific scaling of shared, group-level perceptual distributions. Listener-specific perceptual correlations and response bias are also modeled as random variables. This model is applied to identification-confusion data from 11 listeners' identifications of ten tokens of each of four consonant categories-[t], [d], [s], [z]-produced by 20 talkers in CV syllables and masked by 10-talker babble. The results indicate that dimension-specific scaling for both listeners and talkers provides a good account of confusion patterns. These findings are discussed in relation to other recent research showing substantial listener-, talker-, and token-based sources of variability in noise-masked speech perception.

摘要

言语交流常常在有噪声的环境中发生。背景噪声引发的感知错误模式会受到听者、噪声以及目标语音特性的影响。本研究引入了对多级通用识别理论的一种修正，其中基于说话者和听者的混淆模式变异性被建模为共享的群体级感知分布的全局或维度特定缩放。听者特定的感知相关性和反应偏差也被建模为随机变量。该模型被应用于11名听者对由20名说话者在CV音节中发出的四个辅音类别（[t]、[d]、[s]、[z]）中每个类别的十个音素的识别混淆数据，这些数据被10名说话者的嘈杂声所掩盖。结果表明，听者和说话者的维度特定缩放能够很好地解释混淆模式。这些发现结合其他近期研究进行了讨论，这些研究表明在噪声掩盖的语音感知中存在大量基于听者、说话者和音素的变异性来源。