语音源模型的感知评估。

Perceptual evaluation of voice source models.

作者信息

Kreiman Jody, Garellek Marc, Chen Gang, Alwan Abeer, Gerratt Bruce R

机构信息

Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA.

Department of Linguistics, University of California-San Diego, 9500 Gilman Drive #0108, La Jolla, California 92093-0108, USA.

出版信息

J Acoust Soc Am. 2015 Jul;138(1):1-10. doi: 10.1121/1.4922174.

DOI:10.1121/1.4922174

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4491021/

Abstract

Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.

摘要

声源模型在与自然声音的拟合程度上存在差异，但尚不清楚哪些拟合差异在感知上是显著的。本研究考察了五种声源模型对40种自然声音的拟合情况，以及用每种建模声源合成的刺激之间的感知匹配程度。听众完成了一项视觉分类和评分任务，以比较用不同声源模型创建的每个声音的版本，并使用多维标度分析结果。对脉冲形状的拟合和对脉冲上界标点的拟合均未预测到观察到的音质差异。此外，声源模型对声门脉冲的开启阶段拟合得比对关闭阶段更好，但与此同时，流量导数负峰值（关闭阶段的一部分）的时间和幅度比声门开口峰值的时间和/或幅度能更好地预测音质的相似性。结果表明，仅仅知道特定声源模型在时域中如何（或多好地）拟合或不拟合目标声源脉冲，对于了解声源的哪些方面对听众很重要几乎没有帮助。

相似文献

1

Perceptual evaluation of voice source models.

J Acoust Soc Am. 2015 Jul;138(1):1-10. doi: 10.1121/1.4922174.

2

Effect of the glottal source and the vocal tract on the partials amplitude of vibrato in male voices.

J Acoust Soc Am. 2006 Apr;119(4):2483-97. doi: 10.1121/1.2177584.

3

Vocal quality factors: analysis, synthesis, and perception.

J Acoust Soc Am. 1991 Nov;90(5):2394-410. doi: 10.1121/1.402044.

4

Analysis of voice source characteristics using a constrained polynomial representation of voice source signals.

J Acoust Soc Am. 2007 Feb;121(2):745-8. doi: 10.1121/1.2359234.

5

Measures of the glottal source spectrum.

J Speech Lang Hear Res. 2007 Jun;50(3):595-610. doi: 10.1044/1092-4388(2007/042).

6

Spectral correlates of glottal voice source waveform characteristics.

J Speech Hear Res. 1989 Sep;32(3):556-65. doi: 10.1044/jshr.3203.556.

7

Perceptual interaction of the harmonic source and noise in voice.

J Acoust Soc Am. 2012 Jan;131(1):492-500. doi: 10.1121/1.3665997.

8

Perceptual sensitivity to first harmonic amplitude in the voice source.

J Acoust Soc Am. 2010 Oct;128(4):2085-9. doi: 10.1121/1.3478784.

9

The use of an auditory model in predicting perceptual ratings of breathy voice quality.

J Voice. 2003 Dec;17(4):502-12. doi: 10.1067/s0892-1997(03)00077-8.

10

Relations between voice range profiles and physiological and perceptual voice characteristics in ten-year-old children.

J Voice. 1994 Sep;8(3):230-9. doi: 10.1016/s0892-1997(05)80294-2.

引用本文的文献

1

The Effect of Rating Method on Reliability of Judgments of Strain Across Populations.

Am J Speech Lang Pathol. 2024 Jan 3;33(1):393-405. doi: 10.1044/2023_AJSLP-23-00174. Epub 2023 Dec 7.

2

The Effect of Visual Sort and Rate Versus Visual Analog Scales on the Reliability of Judgments of Dysphonia.

J Speech Lang Hear Res. 2021 May 11;64(5):1571-1580. doi: 10.1044/2021_JSLHR-20-00623. Epub 2021 Apr 28.

3

An acoustic source model for asymmetric intraglottal flow with application to reduced-order models of the vocal folds.

PLoS One. 2019 Jul 25;14(7):e0219914. doi: 10.1371/journal.pone.0219914. eCollection 2019.

4

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.

Behav Res Methods. 2019 Apr;51(2):778-792. doi: 10.3758/s13428-018-1095-7.

5

Mechanics of human voice production and control.

J Acoust Soc Am. 2016 Oct;140(4):2614. doi: 10.1121/1.4964509.

6

Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.

J Speech Lang Hear Res. 2016 Oct 1;59(5):994-1001. doi: 10.1044/2016_JSLHR-S-15-0307.

本文引用的文献

1

Toward a unified theory of voice production and perception.

Loquens. 2014 Jan;1(1). doi: 10.3989/loquens.2014.009.

2

Acoustic and perceptual effects of changes in body layer stiffness in symmetric and asymmetric vocal fold models.

J Acoust Soc Am. 2013 Jan;133(1):453-62. doi: 10.1121/1.4770235.

3

Integrated software for analysis and synthesis of voice quality.

Behav Res Methods. 2010 Nov;42(4):1030-41. doi: 10.3758/BRM.42.4.1030.

4

When and why listeners disagree in voice quality assessment tasks.

J Acoust Soc Am. 2007 Oct;122(4):2354-64. doi: 10.1121/1.2770547.

5

Measures of the glottal source spectrum.

J Speech Lang Hear Res. 2007 Jun;50(3):595-610. doi: 10.1044/1092-4388(2007/042).

6

Perception of aperiodicity in pathological voice.

J Acoust Soc Am. 2005 Apr;117(4 Pt 1):2201-11. doi: 10.1121/1.1858351.

7

The visual sort and rate method for perceptual evaluation in listening tests.

Logoped Phoniatr Vocol. 2003;28(3):109-16. doi: 10.1080/14015430310015255.

8

Glottal characteristics of female speakers: acoustic correlates.

J Acoust Soc Am. 1997 Jan;101(1):466-81. doi: 10.1121/1.417991.

9

A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals.

J Speech Hear Res. 1993 Apr;36(2):254-66. doi: 10.1044/jshr.3602.254.

10

The ear as a frequency analyzer. II.

J Acoust Soc Am. 1968 Apr;43(4):764-7. doi: 10.1121/1.1910894.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。