Anand Supraja
Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
J Voice. 2025 Jul;39(4):1131.e31-1131.e43. doi: 10.1016/j.jvoice.2023.02.014. Epub 2023 Mar 16.
Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness.
VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively.
High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels.
Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.
嗓音质量(VQ)的临床评估通常采用持续发声以及更长、更复杂发声的组合方式。本研究的目的是比较在广泛的发声障碍严重程度范围内,持续发声和连贯言语中所感知到的嗓音气息声和粗糙声,并评估它们与声学指标以及气息声和粗糙声的生物启发模型之间的关系。
基于持续的/a/发声和第5句CAPE-V句子,使用VQ维度特定的单变量匹配任务(SVMT)来对5名男性和5名女性说话者所感知到的气息声或粗糙声进行指标化。分别使用谐波峰值、自相关峰值的声学指标以及音高强度和时间包络标准差(EnvSD)的心理声学指标来预测从10名听众那里获得的感知气息声和粗糙声判断。
在持续发声和连贯言语中观察到了较高的听众内和听众间可靠性。对于大多数发声障碍嗓音,使用SVMT获得的持续元音和句子的感知气息声和粗糙声高度相关。与元音和句子中的谐波峰值相比,气息声的音高强度模型能够捕捉到更大的感知方差。自相关峰值与句子中感知到的粗糙声强烈相关,而EnvSD与元音中感知到的粗糙声强烈相关。
结果提供了证据表明,通过SVMT对VQ的感知可以成功扩展到连贯言语。VQ的计算模型可以轻松适应连贯言语。这种VQ感知的自动化模型因其计算效率以及准确捕捉人类听觉系统非线性的能力而具有价值。