根据物理声道模型的“真实情况”评估通过重新分配频谱图获得的共振的准确性。

Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.

作者信息

Shadle Christine H, Fulop Sean A, Chen Wei-Rong, Whalen D H

机构信息

Yale Child Study Center, School of Medicine, Yale University, New Haven, Connecticut 06511, USA.

Department of Linguistics, Fresno State University, Fresno, California 93740, USA.

出版信息

J Acoust Soc Am. 2024 Feb 1;155(2):1253-1263. doi: 10.1121/10.0024548.

DOI:10.1121/10.0024548

PMID:38341748

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10858790/

Abstract

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.

摘要

重新分配的频谱图（RS）已成为从声学信号中推断声道共振的最准确方法[沙德尔、南和惠伦（2016年）。“比较合成元音和自然元音中元音共振峰的测量误差”，《美国声学学会杂志》139(2)，713 - 727]。迄今为止，验证其准确性依赖于对这些共振的真实值进行共振峰合成。合成易于控制，但它有许多内在假设，不一定能像物理共振那样准确地实现声学效果。在这里，我们表明具有可推导共振值的声道物理模型允许采用一种不同的方法来确定真实值，且有不同的局限性。我们的三维打印声道模型由白噪声激发，从而能够准确确定共振频率。然后，实现了一系列具有不同基频的声源，从而可以直接评估RS是否避免了其他分析技术容易出现的朝向最近强谐波的系统偏差。在高达300Hz的基频下，RS确实是准确的；高于该频率，准确性会有所降低。未来的方向包括测试具有儿童声道尺寸的机械模型，以及通过自动检测共振使RS更广泛地有用。