Kreiman Jody, Garellek Marc, Chen Gang, Alwan Abeer, Gerratt Bruce R
Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA.
Department of Linguistics, University of California-San Diego, 9500 Gilman Drive #0108, La Jolla, California 92093-0108, USA.
J Acoust Soc Am. 2015 Jul;138(1):1-10. doi: 10.1121/1.4922174.
Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.
声源模型在与自然声音的拟合程度上存在差异,但尚不清楚哪些拟合差异在感知上是显著的。本研究考察了五种声源模型对40种自然声音的拟合情况,以及用每种建模声源合成的刺激之间的感知匹配程度。听众完成了一项视觉分类和评分任务,以比较用不同声源模型创建的每个声音的版本,并使用多维标度分析结果。对脉冲形状的拟合和对脉冲上界标点的拟合均未预测到观察到的音质差异。此外,声源模型对声门脉冲的开启阶段拟合得比对关闭阶段更好,但与此同时,流量导数负峰值(关闭阶段的一部分)的时间和幅度比声门开口峰值的时间和/或幅度能更好地预测音质的相似性。结果表明,仅仅知道特定声源模型在时域中如何(或多好地)拟合或不拟合目标声源脉冲,对于了解声源的哪些方面对听众很重要几乎没有帮助。