Englert Marina, Madazio Glaucya, Gielow Ingrid, Lucero Jorge, Behlau Mara
Department of Speech Language Pathology and Audiology, Universidade Federal de São Paulo, São Paulo, Brazil; Voice Department, Centro de Estudos da Voz-CEV, São Paulo, Brazil.
Voice Department, Centro de Estudos da Voz-CEV, São Paulo, Brazil.
J Voice. 2016 Sep;30(5):639.e17-23. doi: 10.1016/j.jvoice.2015.07.017. Epub 2015 Aug 31.
OBJECTIVES/HYPOTHESIS: To verify the discriminatory ability of human and synthesized voice samples.
This is a prospective study.
A total of 70 subjects, 20 voice specialist speech-language pathologists (V-SLPs), 20 general SLPs (G-SLPs), and 30 naive listeners (NLs) participated of a listening task that was simply to classify the stimuli as human or synthesized. Samples of 36 voices, 18 human and 18 synthesized vowels, male and female (9 each), with different type and degree of deviation, were presented with 50% of repetition to verify intrarater consistency. Human voices were collected from a vocal clinic database. Voice disorders were simulated by perturbations of vocal frequency, jitter (roughness), additive noise (breathiness) and by increasing tension and decreasing separation of the vocal folds (strain).
The average amount of error considering all groups was 37.8%, 31.9% for V-SLP, 39.3% for G-SLP, and 40.8% for NL. V-SLP had smaller mean percentage error for synthesized (24.7%), breathy (36.7%), synthesized breathy (30.8%), and tense (25%) and female (27.5%) voices. G-SLP and NL presented equal mean percentage error for all voices classification. All groups together presented no difference on the mean percentage error between human and synthesized voices (P value = 0.452).
The quality of synthesized samples was very high. V-SLP presented a lower amount of error, which allows us to infer that auditory training assists on vocal analysis tasks.
目的/假设:验证人类语音样本和合成语音样本的辨别能力。
这是一项前瞻性研究。
共有70名受试者参与了一项听力任务,其中包括20名嗓音专科言语语言病理学家(V-SLP)、20名普通言语语言病理学家(G-SLP)和30名普通听众(NL),任务仅仅是将刺激声音分类为人类语音或合成语音。呈现了36个语音样本,其中18个是人类语音,18个是合成元音,包括男性和女性(各9个),具有不同类型和程度的偏差,以50%的重复率呈现以验证评分者内部一致性。人类语音从一个嗓音诊所数据库中收集。通过改变嗓音频率、抖动(粗糙度)、加性噪声(呼吸声)以及增加声带张力和减小声带间距(紧张度)来模拟嗓音障碍。
考虑所有组的平均错误率为37.8%,V-SLP为31.9%,G-SLP为39.3%,NL为40.8%。V-SLP在合成语音(24.7%)、呼吸声语音(36.7%)、合成呼吸声语音(30.8%)、紧张语音(25%)和女性语音(27.5%)方面的平均百分比错误较小。G-SLP和NL在所有语音分类中的平均百分比错误相同。所有组在人类语音和合成语音之间的平均百分比错误上没有差异(P值 = 0.452)。
合成样本的质量非常高。V-SLP的错误率较低,这使我们能够推断听觉训练有助于嗓音分析任务。