Story Brad H, Maxfield Lynn, Palaparthi Anil, Ferguson Sarah Hargus, Titze Ingo
Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
Utah Center for Vocology, University of Utah, Salt Lake City, Utah 84112, USA.
J Acoust Soc Am. 2025 Sep 1;158(3):2207-2224. doi: 10.1121/10.0039348.
The purpose of this study was to investigate the degree to which the coupling between the oscillating sound source and the vocal tract filter occurs in connected speech samples, and to provide insight into how humans may choose to deploy this coupling for intelligibility, intensity, or both. A technique was developed to extract, from minutes-long speech samples, the time-dependent fundamental frequency (fo) and the first two formant frequencies (F1 and F2) to permit an analysis that determines whether a talker aligns a voice source harmonic with a vocal tract resonance, and also measures a normalized vowel space area. The accuracy of the processing method was validated by applying it to a set of audio samples generated via speech simulation that provided "ground-truth" data. It was then applied to a 41-talker database of clear and conversational speech. Results indicated that talkers make adjustments for different speaking styles that include not only increased vowel space area but also alignment of harmonics and formant frequencies, although future work is needed to determine whether these adjustments are directed toward maximizing transfer of information or transfer of acoustic power.
本研究的目的是调查在连贯语音样本中振荡声源与声道滤波器之间耦合发生的程度,并深入了解人类如何选择利用这种耦合来提高可懂度、强度或两者兼顾。开发了一种技术,从长达数分钟的语音样本中提取随时间变化的基频(fo)和前两个共振峰频率(F1和F2),以便进行分析,确定说话者是否将声源谐波与声道共振对齐,同时测量归一化元音空间面积。通过将该处理方法应用于一组通过语音模拟生成的音频样本(提供“真实”数据),验证了该处理方法的准确性。然后将其应用于一个包含41名说话者的清晰对话语音数据库。结果表明,说话者会针对不同的说话风格进行调整,不仅包括增加元音空间面积,还包括谐波与共振峰频率的对齐,不过仍需进一步研究来确定这些调整是否旨在最大化信息传递或声功率传递。