Nearey T M
Department of Linguistics, University of Alberta, Edmonton, Canada.
J Acoust Soc Am. 1989 May;85(5):2088-113. doi: 10.1121/1.397861.
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.
本研究回顾了与英语元音感知相关的理论和实证研究结果,包括两项新实验的结果,重点是将数据分析“机器识别”方法的结果与语音感知实验的结果进行比较。从F1×F2空间中元音类别重叠的经典角度探讨了两个主要变异来源(即说话者差异和辅音语境效应)。评估了减少这种重叠的各种方法。考虑了两种类型的说话者归一化。基于单个元音音素内稳态特性(F0、F1、F2和F3)之间关系的“内在”方法与涉及单个说话者整个元音系统共振峰频率之间关系的“外在”方法形成对比。一项新实验的证据支持了安斯沃思(1975年)的结论[W.安斯沃思,《语音的听觉分析与感知》(学术出版社,伦敦,1975年)],即这两种信息在感知中都发挥着作用。还考虑了辅音语境对共振峰重叠的影响。提出了一项新实验,将林德布洛姆和斯塔德特-肯尼迪[B.林德布洛姆和M.斯塔德特-肯尼迪,《美国声学学会杂志》43,840 - 843(1967年)]关于辅音语境对元音感知的感知效应的研究结果扩展到/dVd/和/bVb/语境。最后,简要回顾了元音固有动态特性的作用,包括时长和双元音化。所有上述因素都被证明对元音感知有可靠影响,尽管这些效应的相对权重以及改变这些权重的情况仍远未明确。建议设计更复杂的感知实验,并开发人类元音感知的定量模式识别模型,以解决这些问题。