Syrdal A K, Gopal H S
J Acoust Soc Am. 1986 Apr;79(4):1086-100. doi: 10.1121/1.393381.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.
本文描述了一种基于心理声学和语音感知数据的人类元音识别定量感知模型。在处理的中间听觉阶段,该模型的特定巴克差异水平表示外周听觉兴奋模式,即相邻共振峰之间以及基频(F0)和第一共振峰(F1)之间的临界频带(巴克)距离。在由模型的临界巴克差异水平表示的更高语音阶段,基于每个维度中共振峰之间的差异是否落在光谱重心效应的3巴克临界距离之内或之外,可以对变换后的元音进行二分分类[Chistovich等人,《听觉研究》1,185 - 195(1979)]。元音变换和分类与几个主要语音维度和特征非常吻合,元音正是通过这些维度和特征被感知并进行传统分类的。F1 - F0维度表示元音高度,高元音的F1 - F0差异在3巴克之内。F3 - F2维度对应元音的发音位置,前元音的F3 - F2差异小于3巴克。作为一种固有的、与说话者无关的归一化程序,该模型在极大降低说话者间变异性的同时,提供了出色的元音聚类。它通过特征分类提供强大的归一化,因为粗略的二元分类允许相当大的声学变异性。通常,紧密间隔的共振峰比间隔较宽的共振峰具有更小的共振峰和巴克差异变异性。这些发现与独立观察到的感知结果一致,并支持史蒂文斯的元音产生量子理论以及从模型的临界巴克差异水平预测的对产生的感知约束。