基于共振峰频率的声道长度短时估计

On Short-Time Estimation of Vocal Tract Length from Formant Frequencies.

作者信息

Lammert Adam C, Narayanan Shrikanth S

机构信息

Computer Science Department, Swarthmore College, Swarthmore, PA, United States of America.

Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States of America.

出版信息

PLoS One. 2015 Jul 15;10(7):e0132193. doi: 10.1371/journal.pone.0132193. eCollection 2015.

DOI:10.1371/journal.pone.0132193

PMID:26177102

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4503663/

Abstract

Vocal tract length is highly variable across speakers and determines many aspects of the acoustic speech signal, making it an essential parameter to consider for explaining behavioral variability. A method for accurate estimation of vocal tract length from formant frequencies would afford normalization of interspeaker variability and facilitate acoustic comparisons across speakers. A framework for considering estimation methods is developed from the basic principles of vocal tract acoustics, and an estimation method is proposed that follows naturally from this framework. The proposed method is evaluated using acoustic characteristics of simulated vocal tracts ranging from 14 to 19 cm in length, as well as real-time magnetic resonance imaging data with synchronous audio from five speakers whose vocal tracts range from 14.5 to 18.0 cm in length. Evaluations show improvements in accuracy over previously proposed methods, with 0.631 and 1.277 cm root mean square error on simulated and human speech data, respectively. Empirical results show that the effectiveness of the proposed method is based on emphasizing higher formant frequencies, which seem less affected by speech articulation. Theoretical predictions of formant sensitivity reinforce this empirical finding. Moreover, theoretical insights are explained regarding the reason for differences in formant sensitivity.

摘要

声道长度在不同说话者之间差异很大，并且决定了声学语音信号的许多方面，这使得它成为解释行为变异性时需要考虑的一个重要参数。一种从共振峰频率准确估计声道长度的方法将能够对说话者间的变异性进行归一化，并便于对不同说话者的声学特征进行比较。基于声道声学的基本原理开发了一个用于考虑估计方法的框架，并提出了一种自然地源于该框架的估计方法。使用长度在14至19厘米范围内的模拟声道的声学特征，以及来自五名声道长度在14.5至18.0厘米之间的说话者的同步音频的实时磁共振成像数据，对所提出的方法进行了评估。评估表明，与先前提出的方法相比，准确性有所提高，在模拟语音数据和人类语音数据上的均方根误差分别为0.631厘米和1.277厘米。实证结果表明，所提出方法的有效性基于强调较高的共振峰频率，而这些频率似乎受语音清晰度的影响较小。共振峰灵敏度的理论预测强化了这一实证发现。此外，还解释了关于共振峰灵敏度差异原因的理论见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/032d/4503663/c5ae7c291527/pone.0132193.g001.jpg

相似文献

On Short-Time Estimation of Vocal Tract Length from Formant Frequencies.

PLoS One. 2015 Jul 15;10(7):e0132193. doi: 10.1371/journal.pone.0132193. eCollection 2015.

A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation.

J Acoust Soc Am. 2004 Jan;115(1):337-51. doi: 10.1121/1.1631946.

Interspeaker variability in hard palate morphology and vowel production.

J Speech Lang Hear Res. 2013 Dec;56(6):S1924-33. doi: 10.1044/1092-4388(2013/12-0211).

Relation of vocal tract shape, formant transitions, and stop consonant identification.

J Speech Lang Hear Res. 2010 Dec;53(6):1514-28. doi: 10.1044/1092-4388(2010/09-0127). Epub 2010 Jul 19.

Changes in the human vocal tract due to aging and the acoustic correlates of speech production: a pilot study.

J Speech Lang Hear Res. 2003 Jun;46(3):689-701. doi: 10.1044/1092-4388(2003/054).

Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002.

J Acoust Soc Am. 2008 Jan;123(1):327-35. doi: 10.1121/1.2805683.

A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Behav Res Methods. 2024 Sep;56(6):5588-5604. doi: 10.3758/s13428-023-02288-x. Epub 2023 Dec 29.

High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings.

PLoS One. 2016 Mar 28;11(3):e0151327. doi: 10.1371/journal.pone.0151327. eCollection 2016.

Perception of synthetic vowel exemplars of 4-year-old children and estimation of their corresponding vocal tract shapes.

J Acoust Soc Am. 2006 Nov;120(5 Pt 1):2850-8. doi: 10.1121/1.2345833.

A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/.

J Acoust Soc Am. 2008 Jun;123(6):4466-81. doi: 10.1121/1.2902168.

引用本文的文献

Detection of Suicide Risk Using Vocal Characteristics: Systematic Review.

JMIR Biomed Eng. 2022 Dec 22;7(2):e42386. doi: 10.2196/42386.

A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Behav Res Methods. 2024 Sep;56(6):5588-5604. doi: 10.3758/s13428-023-02288-x. Epub 2023 Dec 29.

Evaluating normalization accounts against the dense vowel space of Central Swedish.

Front Psychol. 2023 Jun 21;14:1165742. doi: 10.3389/fpsyg.2023.1165742. eCollection 2023.

Impacts of Development, Dentofacial Disharmony, and Its Surgical Correction on Speech: A Narrative Review for Dental Professionals.

Appl Sci (Basel). 2023 May;13(9). doi: 10.3390/app13095496. Epub 2023 Apr 28.

A mixed-method feasibility study of the use of the Complete Vocal Technique (CVT), a pedagogic method to improve the voice and vocal function in singers and actors, in the treatment of patients with muscle tension dysphonia: a study protocol.

Pilot Feasibility Stud. 2023 May 24;9(1):88. doi: 10.1186/s40814-023-01317-y.

Difficulties Experienced by Older Listeners in Utilizing Voice Cues for Speaker Discrimination.

Front Psychol. 2022 Mar 3;13:797422. doi: 10.3389/fpsyg.2022.797422. eCollection 2022.

Individual differences in vocal size exaggeration.

Sci Rep. 2022 Feb 16;12(1):2611. doi: 10.1038/s41598-022-05170-6.

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

Variability in individual constriction contributions to third formant values in American English /ɹ/.

J Acoust Soc Am. 2020 Jun;147(6):3905. doi: 10.1121/10.0001413.

Formant-Estimated Vocal Tract Length and Extrinsic Laryngeal Muscle Activation During Modulation of Vocal Effort in Healthy Speakers.

J Speech Lang Hear Res. 2020 May 22;63(5):1395-1403. doi: 10.1044/2020_JSLHR-19-00234. Epub 2020 May 7.

本文引用的文献

A computational approach to edge detection.

IEEE Trans Pattern Anal Mach Intell. 1986 Jun;8(6):679-98.

Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order.

Magn Reson Med. 2011 May;65(5):1365-71. doi: 10.1002/mrm.22714. Epub 2010 Dec 16.

Vocal tract configurations in male alto register functions.

J Voice. 2011 Nov;25(6):670-7. doi: 10.1016/j.jvoice.2010.09.008. Epub 2011 Mar 12.

Vowel constrictions are recoverable from formants.

J Phon. 2010 Jul 1;38(3):375-387. doi: 10.1016/j.wocn.2010.03.002.

A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data.

J Acoust Soc Am. 2009 Apr;125(4):2374-86. doi: 10.1121/1.3079772.

Anatomic development of the oral and pharyngeal portions of the vocal tract: an imaging study.

J Acoust Soc Am. 2009 Mar;125(3):1666-78. doi: 10.1121/1.3075589.

Flexible real-time magnetic resonance imaging framework.

Conf Proc IEEE Eng Med Biol Soc. 2004;2004:1048-51. doi: 10.1109/IEMBS.2004.1403343.

Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans.

J Acoust Soc Am. 2006 Oct;120(4):1791-4. doi: 10.1121/1.2335423.

Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.

J Acoust Soc Am. 2006 Feb;119(2):1037-49. doi: 10.1121/1.2151823.

Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions.

J Acoust Soc Am. 2006 Feb;119(2):715-8. doi: 10.1121/1.2151802.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于共振峰频率的声道长度短时估计

On Short-Time Estimation of Vocal Tract Length from Formant Frequencies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献