Department of Head and Neck Surgery, UCLA School of Medicine, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA.
Department of Linguistics, University of California, Los Angeles, 3125 Campbell Hall, Box 951543, Los Angeles, California 90095-1543, USA.
J Acoust Soc Am. 2019 Sep;146(3):1568. doi: 10.1121/1.5125134.
Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.
关于日常语音质量的变化性质或程度,人们知之甚少。本文描述了一系列主成分分析,以探索说话者内和说话者间的声学变化,以及它们在多大程度上符合当前语音感知模型的预期。基于对人脸和说话者识别认知模型的研究,作者假设一些指标在说话者之间很重要,但大多数说话者内的变化是特质的。分析使用了来自 50 名女性和 50 名男性英语说话者的三天内多次句子产生的数据,对元音和近音进行了每 5 毫秒的 26 个声学变量的测量。在说话者之间,声音中较高谐波振幅与非谐波能量之间的平衡解释了最大的方差(女性= 20%,男性= 22%)。共振峰频率及其可变性占说话者间方差的 12%。其余的方差似乎主要是特质的,这表明不同的人具有不同的特定于说话者的声音空间。结果还表明,个体和说话者群体的声音空间具有非常相似的声学结构。讨论了对语音感知和识别原型模型的影响。