用于前元音塞音感知的各种声学线索集的评估。II. 建模与评估。

Evaluation of various sets of acoustic cues for the perception of prevocalic stop consonants. II. Modeling and evaluation.

作者信息

Smits R, ten Bosch L, Collier R

机构信息

Institute for Perception Research (IPO), Eindhoven, The Netherlands.

出版信息

J Acoust Soc Am. 1996 Dec;100(6):3865-81. doi: 10.1121/1.417242.

DOI:10.1121/1.417242

PMID:8969487

Abstract

The purpose of the study presented in this paper and the accompanying paper [Smits et al., J. Acoust. Soc. Am. 100, 3852-3864 (1996)] is to evaluate whether detailed or gross time-frequency structures are more relevant for the perception of prevocalic stop consonants. To this end, first a perception experiment was carried out with "burst-spliced" stop-vowel utterances. This experiment is described in the accompanying paper. The present paper describes the second part of the investigation, i.e., the simulation of the behavior of the listeners in the perception experiment. First, a number of detailed and gross cues are measured on the stimuli. Next, these cues are mapped onto the observed perceptual data using a formal model of human classification behavior. The results show that in all cases the detailed cues, such as formant transitions, give a better account of the perceptual data than the gross cues, such as the global spectral tilt and its initial change. The best-performing models are interpreted in terms of the acoustic boundaries which are associated with the perceived linguistic contrast. These boundaries are highly interpretable linear functions of five or six acoustic cues, which give a quantitative description of the often-discussed "trade-off" relation between the various cues for perception of place of articulation in stop consonants.

摘要

本文及随附论文[斯米茨等人，《美国声学学会杂志》100, 3852 - 3864 (1996)]中所呈现研究的目的，是评估详细的或总体的时频结构对于元音前塞音感知而言哪一个更具相关性。为此，首先用“爆发音拼接”的塞音 - 元音发音进行了一项感知实验。该实验在随附论文中有描述。本文描述了研究的第二部分，即对感知实验中听众行为的模拟。首先，在刺激信号上测量了一些详细的和总体的线索。接下来，使用人类分类行为的形式模型将这些线索映射到观察到的感知数据上。结果表明，在所有情况下，诸如共振峰过渡等详细线索比诸如整体频谱倾斜及其初始变化等总体线索能更好地解释感知数据。表现最佳的模型是根据与感知到的语言对比相关联的声学边界来解释的。这些边界是五六个声学线索的高度可解释的线性函数，它们对塞音发音部位感知中各种线索之间经常讨论的“权衡”关系给出了定量描述。