Suppr超能文献

调节基频和语速对合成语音的可懂度、通信效率及感知自然度的影响

The Effects of Modulating Fundamental Frequency and Speech Rate on the Intelligibility, Communication Efficiency, and Perceived Naturalness of Synthetic Speech.

作者信息

Vojtech Jennifer M, Noordzij Jacob P, Cler Gabriel J, Stepp Cara E

机构信息

Department of Biomedical Engineering, Boston University, MA.

Department of Speech, Language, and Hearing Sciences, Boston University, MA.

出版信息

Am J Speech Lang Pathol. 2019 Jul 15;28(2S):875-886. doi: 10.1044/2019_AJSLP-MSC18-18-0052.

Abstract

Purpose This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech. Method Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli. Results Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level. Conclusions Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material https://doi.org/10.23641/asha.8847833.

摘要

目的 本研究调查了调制基频(f0)和语速如何不同地影响合成语音的自然度、可懂度和通信效率。方法 通过语音合成器生成了16个韵律内容各异的句子。改变这些句子的f0轮廓和语速以产生4组刺激:(a)固定f0水平的正常语速,(b)固定f0水平的慢速,(c)具有韵律自然f0变化的正常语速,以及(d)具有韵律不自然f0变化的正常语速。16名听众对这些刺激进行正字法转录并判断自然度。结果 具有f0变化的句子比具有固定f0水平的句子被评为更自然。相反,具有固定f0水平的句子比具有f0变化的句子表现出更高的可懂度。语速对具有固定f0水平的刺激的可懂度没有影响。正常语速和固定f0水平生成的句子的通信效率最高。结论 句子层面的f0变化提高了合成语音的自然度评分,无论这种变化在韵律上是否自然。然而,这些f0变化降低了可懂度。有证据表明合成语音在自然度和可懂度之间存在权衡,这可能会影响未来的语音合成设计。补充材料 https://doi.org/10.23641/asha.8847833

相似文献

2
The effect of fundamental frequency on the intelligibility of speech with flattened intonation contours.
Am J Speech Lang Pathol. 2008 Nov;17(4):348-55. doi: 10.1044/1058-0360(2008/07-0048). Epub 2008 Oct 7.
3
Fundamental frequency variation with an electrolarynx improves speech understanding: a case study.
Am J Speech Lang Pathol. 2009 May;18(2):162-7. doi: 10.1044/1058-0360(2008/08-0025). Epub 2008 Dec 23.
5
Listener Perception of Monopitch, Naturalness, and Intelligibility for Speakers With Parkinson's Disease.
J Speech Lang Hear Res. 2015 Aug;58(4):1134-44. doi: 10.1044/2015_JSLHR-S-14-0243.
6
Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach.
Clin Linguist Phon. 2015;29(12):938-54. doi: 10.3109/02699206.2015.1081293. Epub 2015 Sep 24.
8
Speech Treatment Effects on Narrative Intelligibility in French-Speaking Children With Dysarthria.
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2154-2168. doi: 10.1044/2020_JSLHR-20-00258. Epub 2021 Mar 9.
10
Investigating Acoustic Correlates of Intelligibility Gains and Losses During Slowed Speech: A Hybridization Approach.
Am J Speech Lang Pathol. 2021 Jun 18;30(3S):1343-1360. doi: 10.1044/2021_AJSLP-20-00172. Epub 2021 May 28.

引用本文的文献

2
Controlling Pitch for Prosody: Sensorimotor Adaptation in Linguistically Meaningful Contexts.
J Speech Lang Hear Res. 2024 Feb 12;67(2):440-454. doi: 10.1044/2023_JSLHR-23-00460. Epub 2024 Jan 19.
3
Recognition of Speech With Dynamic Pitch Manipulation in Noise: Effects of Manipulation Methods.
J Speech Lang Hear Res. 2024 Jan 8;67(1):269-281. doi: 10.1044/2023_JSLHR-23-00142. Epub 2023 Nov 20.

本文引用的文献

1
Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion.
IEEE/ACM Trans Audio Speech Lang Process. 2018 Dec;26(12):2267-2276. doi: 10.1109/TASLP.2018.2860682. Epub 2018 Jul 27.
2
Speech and language therapists' views about AAC system acceptance by people with acquired communication disorders.
Disabil Rehabil Assist Technol. 2019 Jul;14(5):471-478. doi: 10.1080/17483107.2018.1463401. Epub 2018 Apr 18.
4
Listener Perception of Monopitch, Naturalness, and Intelligibility for Speakers With Parkinson's Disease.
J Speech Lang Hear Res. 2015 Aug;58(4):1134-44. doi: 10.1044/2015_JSLHR-S-14-0243.
5
Relationship Between Speech Intelligibility and Speech Comprehension in Babble Noise.
J Speech Lang Hear Res. 2015 Jun;58(3):977-86. doi: 10.1044/2015_JSLHR-H-13-0335.
6
Effect of the number of presentations on listener transcriptions and reliability in the assessment of speech intelligibility in children.
Int J Lang Commun Disord. 2015 Jul;50(4):476-87. doi: 10.1111/1460-6984.12149. Epub 2015 Jan 14.
9
The impact of speech disorders quality of life: a questionnaire proposal.
Codas. 2013;25(6):610-3. doi: 10.1590/S2317-17822013.05000011.
10
Acoustic and perceptual consequences of clear and loud speech.
Folia Phoniatr Logop. 2013;65(4):214-20. doi: 10.1159/000355867. Epub 2014 Feb 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验