Yousef Ahmed M, Hunter Eric J
Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA.
Bioengineering (Basel). 2024 Dec 11;11(12):1253. doi: 10.3390/bioengineering11121253.
Room reverberation can affect oral/aural communication and is especially critical in computer analysis of voice. High levels of reverberation can distort voice recordings, impacting the accuracy of quantifying voice production quality and vocal health evaluations. This study quantifies the impact of additive simulated reverberation on otherwise clean voice recordings as reflected in voice metrics commonly used for voice quality evaluation. From a larger database of voice recordings collected in a low-noise, low-reverberation environment, voice samples of a sustained [a:] vowel produced at two different speaker intents (comfortable and clear) by five healthy voice college-age female native English speakers were used. Using the reverb effect in Audacity, eight reverberation situations indicating a range of reverberation times (T20 between 0.004 and 1.82 s) were simulated and convolved with the original recordings. All voice samples, both original and reverberation-affected, were analyzed using freely available PRAAT software (version 6.0.13) to calculate five common voice parameters: jitter, shimmer, harmonic-to-noise ratio (HNR), alpha ratio, and smoothed cepstral peak prominence (CPPs). Statistical analyses assessed the sensitivity and variations in voice metrics to a range of simulated room reverberation conditions. Results showed that jitter, HNR, and alpha ratio were stable at simulated reverberation times below T20 of 1 s, with HNR and jitter more stable in the clear vocal style. Shimmer was highly sensitive even at T20 of 0.53 s, which would reflect a common room, while CPPs remained stable across all simulated reverberation conditions. Understanding the sensitivity and stability of these voice metrics to a range of room acoustics effects allows for targeted use of certain metrics even in less controlled environments, enabling selective application of stable measures like CPPs and cautious interpretation of shimmer, ensuring more reliable and accurate voice assessments.
房间混响会影响口头/听觉交流,在语音的计算机分析中尤为关键。高水平的混响会使语音录音失真,影响量化语音产生质量和嗓音健康评估的准确性。本研究量化了加性模拟混响对原本清晰的语音录音的影响,这在常用于语音质量评估的语音指标中有所体现。从在低噪声、低混响环境中收集的更大语音录音数据库中,选取了五名以英语为母语、处于大学年龄的健康女性说话者在两种不同说话意图(舒适和清晰)下发出的持续[a:]元音的语音样本。利用Audacity中的混响效果,模拟了八种表示不同混响时间范围(T20在0.004秒至1.82秒之间)的混响情况,并将其与原始录音进行卷积。使用免费的PRAAT软件(版本6.0.13)对所有语音样本(原始样本和受混响影响的样本)进行分析,以计算五个常见的语音参数:抖动、闪烁、谐波噪声比(HNR)、阿尔法比率和平滑的谐波峰值突出度(CPPs)。统计分析评估了语音指标对一系列模拟房间混响条件的敏感性和变化情况。结果表明,在模拟混响时间低于T20为1秒时,抖动、HNR和阿尔法比率较为稳定,在清晰的发声风格中HNR和抖动更稳定。即使在T20为0.53秒(这反映的是常见房间)时,闪烁也高度敏感,而CPPs在所有模拟混响条件下都保持稳定。了解这些语音指标对一系列房间声学效果的敏感性和稳定性,即使在控制较差的环境中也能有针对性地使用某些指标,从而能够选择性地应用像CPPs这样稳定的测量方法,并谨慎解读闪烁情况,确保进行更可靠、准确的语音评估。