Bladon R A, Lindblom B
J Acoust Soc Am. 1981 May;69(5):1414-22. doi: 10.1121/1.385824.
The hypothesis of this study is that the auditory cues relevant to listeners' judgment of vowel quality are a spectral representation of loudness density versus pitch. A model is described that generates such patterns for steady-state vowels. In addition to the nonlinear transformations underlying the loudness density and pitch scales, it incorporates experimentally established characteristics associated with frequency resolution and masking, such as the critical band concept. This model is combined with a measure of auditory perceptual distance which, operating on pairs of vowels, treats each stimulus representation as a single spectral shape. In order to test the distance metric and the model, experimental data were gathered from listeners' numerical estimates of quality differences between stimulus pairs which compared four-formant and two-formant vowels. The correlation between experimental and theoretical results was 0.89. We interpret this value to indicate that the present definition of auditory cue and auditory distance can be said to account for the experimental behavior of our listeners only in a rather gross fashion. On the other hand, the theory was developed on the basis of rather conservative assumptions about the nature of auditory cues. For instance, the model ignores the possibility of temporal coding and certain nonlinear effects, and it does not pay special attention to spectral peaks. Seen in that light, the agreement between observed and predicted auditory distance is remarkably good.
本研究的假设是,与听众对元音质量判断相关的听觉线索是响度密度与音高的频谱表示。文中描述了一个为稳态元音生成此类模式的模型。除了响度密度和音高标度所基于的非线性变换外,它还纳入了与频率分辨率和掩蔽相关的实验确定的特征,如临界带概念。该模型与一种听觉感知距离度量相结合,该度量作用于元音对,将每个刺激表示视为单一频谱形状。为了测试距离度量和模型,从听众对刺激对(比较四共振峰和二共振峰元音)之间质量差异的数值估计中收集了实验数据。实验结果与理论结果之间的相关性为0.89。我们认为这个值表明,目前对听觉线索和听觉距离的定义只能以一种相当粗略的方式解释我们听众的实验行为。另一方面,该理论是基于对听觉线索性质相当保守的假设发展而来的。例如,该模型忽略了时间编码的可能性和某些非线性效应,并且没有特别关注频谱峰值。从这个角度来看,观察到的和预测的听觉距离之间的一致性非常好。