Suppr超能文献

Perceptual Error Analysis of Human and Synthesized Voices.

作者信息

Englert Marina, Madazio Glaucya, Gielow Ingrid, Lucero Jorge, Behlau Mara

机构信息

Universidade Federal de São Paulo, São Paulo, Brazil; Centro de Estudos da Voz-CEV, São Paulo, Brazil.

Centro de Estudos da Voz-CEV, São Paulo, Brazil.

出版信息

J Voice. 2017 Jul;31(4):516.e5-516.e18. doi: 10.1016/j.jvoice.2016.12.015. Epub 2017 Jan 12.

Abstract

OBJECTIVE/HYPOTHESIS: To assess the quality of synthesized voices through listeners' skills in discriminating human and synthesized voices.

STUDY DESIGN

Prospective study.

METHODS

Eighteen human voices with different types and degrees of deviation (roughness, breathiness, and strain, with three degrees of deviation: mild, moderate, and severe) were selected by three voice specialists. Synthesized samples with the same deviations of human voices were produced by the VoiceSim system. The manipulated parameters were vocal frequency perturbation (roughness), additive noise (breathiness), increasing tension, subglottal pressure, and decreasing vocal folds separation (strain). Two hundred sixty-nine listeners were divided in three groups: voice specialist speech language pathologists (V-SLPs), general clinician SLPs (G-SLPs), and naive listeners (NLs). The SLP listeners also indicated the type and degree of deviation.

RESULTS

The listeners misclassified 39.3% of the voices, both synthesized (42.3%) and human (36.4%) samples (P = 0.001). V-SLPs presented the lowest error percentage considering the voice nature (34.6%); G-SLPs and NLs identified almost half of the synthesized samples as human (46.9%, 45.6%). The male voices were more susceptible for misidentification. The synthesized breathy samples generated a greater perceptual confusion. The samples with severe deviation seemed to be more susceptible for errors. The synthesized female deviations were correctly classified. The male breathiness and strain were identified as roughness.

CONCLUSION

VoiceSim produced stimuli very similar to the voices of patients with dysphonia. V-SLPs had a better ability to classify human and synthesized voices. VoiceSim is better to simulate vocal breathiness and female deviations; the male samples need adjustment.

摘要

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验