Régnier Marion S, Allen Jont B
ECE Department and The Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 North Mathews, Urbana, Illinois 61801, USA.
J Acoust Soc Am. 2008 May;123(5):2801-14. doi: 10.1121/1.2897915.
This study focuses on correlating speech confusion patterns, defined as consonant-vowel confusion as a function of the speech-to-noise ratio, and a model acoustic feature (AF) representation called the AI gram, defined as the articulation index density in the spectrotemporal domain. By collecting many responses from many talkers and listeners, the AF and psychophysical feature (event) is shown to be correlated via the AI-gram model and the confusion matrices at the utterance level, thereby explaining the listener confusion. Consonant /t/ is used as an example to identify its primary robust-to-noise feature, and a precise correlation of the acoustic information with the listeners' confusions is used to label the event. The main spectrotemporal cue defining the /t/ event is an across-frequency temporal coincidence, wherein frequency spread and robustness vary across utterances, while the event remains invariant. The cross-frequency timing event is shown to be the key perceptual feature for consonants in a vowel following context. Coincidences are found to form the basic element of the auditory object. Neural circuits used for coincidence in binaural processing for localization across ears are proposed to be used within one ear across channels. It is further concluded that the event is based on the audibility of the /t/ burst rather than on any superthreshold property.
本研究着重于关联语音混淆模式(定义为作为信噪比函数的辅音 - 元音混淆)和一种名为AI图的模型声学特征(AF)表示(定义为频谱时域中的清晰度指数密度)。通过收集众多说话者和听众的大量反应,AF与心理物理特征(事件)通过AI图模型和话语层面的混淆矩阵被证明是相关的,从而解释了听众的混淆。以辅音/t/为例来识别其主要的抗噪声特征,并利用声学信息与听众混淆的精确相关性来标记该事件。定义/t/事件的主要频谱时域线索是跨频率的时间一致性,其中频率扩展和稳健性在不同话语中有所变化,而该事件保持不变。跨频率定时事件被证明是元音后接环境中辅音的关键感知特征。发现一致性构成了听觉对象的基本要素。有人提出,用于双耳处理中跨耳定位的用于一致性的神经回路可在单耳内跨通道使用。进一步得出的结论是,该事件基于/t/爆破音的可听度而非任何阈上特性。