Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
AventuSoft LLC, Boca Raton, Florida.
J Voice. 2019 Jul;33(4):473-481. doi: 10.1016/j.jvoice.2017.12.018. Epub 2018 May 24.
This study aims to determine the sensitivity of perceptual and computational correlates of breathy and rough voice quality (VQ) across multiple vowel categories using single-variable matching tasks (SVMTs).
Sustained phonations of /a/, /i/, and /u/ from 20 dysphonic talkers (10 with primarily breathy voices and 10 with primarily rough voices) were selected from the University of Florida Dysphonic Voice Database. For primarily breathy voices, perceived breathiness was judged, and for primarily rough voices, perceived roughness was judged by the same group of 10 listeners using an SVMT with five replicates per condition. Measures of pitch strength, cepstral peak, and autocorrelation peak were applied to models of the perceptual data.
Intra- and inter-rater reliability were high for both the breathiness and the roughness perceptual tasks. For breathiness judgments, the effect of vowel was small. Averaged over all talkers and listeners, breathiness judgments for /a/, /i/, and /u/ were -11.6, -11.2, and -12.2 dB noise-to-signal ratio, respectively. For roughness judgments, the effect of vowel was larger. The perceived roughness of /a/ was higher than /i/ or /u/ by 3 dB modulation depth. Pitch strength was the most accurate predictor of breathiness matching (r = 0.84-0.94 across vowels), and log-transformed autocorrelation peak was the most accurate predictor of roughness matching (r = 0.59-0.83 across vowels).
Breathiness is more consistently represented across vowels for dysphonic voices than roughness. This work represents a critical step in advancing studies of voice quality perception from single vowels to running speech.
本研究旨在通过单变量匹配任务(SVMT)确定多种元音类别下呼吸声和粗糙声音质(VQ)的知觉和计算相关物的敏感性。
从佛罗里达大学发声障碍语音数据库中选择 20 位发声障碍者(10 位主要为呼吸声,10 位主要为粗糙声)的/a/、/i/和/u/持续发声。对于主要为呼吸声的发声者,由同一组 10 位听众使用具有 5 个重复条件的 SVMT 来判断感知的呼吸声,对于主要为粗糙声的发声者,由同一组 10 位听众使用 SVMT 来判断感知的粗糙度。将音高强度、倒谱峰值和自相关峰值应用于感知数据模型。
呼吸声和粗糙度感知任务的内部和组间可靠性均很高。对于呼吸声判断,元音的影响较小。平均而言,所有说话者和听众的/a/、/i/和/u/的呼吸声判断值分别为-11.6、-11.2 和-12.2dB 噪声与信号比。对于粗糙度判断,元音的影响更大。/a/的感知粗糙度比/i/或/u/高 3dB 调制深度。音高强度是呼吸声匹配的最准确预测指标(跨元音 r=0.84-0.94),对数变换自相关峰值是粗糙度匹配的最准确预测指标(跨元音 r=0.59-0.83)。
与粗糙度相比,呼吸声在发声障碍者的元音中更一致地表现出来。这项工作是将语音质量感知研究从单元音推进到连续语音的重要一步。