Suppr超能文献

基于深度学习的方法,利用有限的时域包络信息合成可理解的语音。

A Deep Learning Based Approach to Synthesize Intelligible Speech with Limited Temporal Envelope Information.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1972-1976. doi: 10.1109/EMBC48229.2022.9871247.

Abstract

Envelope waveforms can be extracted from multiple frequency bands of a speech signal, and envelope waveforms carry important intelligibility information for human speech communication. This study aimed to investigate whether a deep learning-based model with features of temporal envelope information could synthesize an intelligible speech, and to study the effect of reducing the number (from 8 to 2 in this work) of temporal envelope information on the intelligibility of the synthesized speech. The objective evaluation metric of short-time objective intelligibility (STOI) showed that, on average, the synthesized speech of the proposed approach provided higher STOI (i.e., 0.8) scores in each test condition; and the human listening test showed that the average word correct rate of eight listeners was higher than 97.5%. These findings indicated that the proposed deep learning-based system can be a potential approach to synthesize a highly intelligible speech with limited envelope information in the future.

摘要

可以从语音信号的多个频带中提取包络波形,而包络波形携带人类语音通信的重要可懂度信息。本研究旨在探讨具有时间包络信息特征的基于深度学习的模型是否可以合成可懂度高的语音,并研究减少时间包络信息数量(在这项工作中从 8 个减少到 2 个)对合成语音可懂度的影响。短期客观可懂度(STOI)的客观评估指标表明,在每个测试条件下,所提出方法的合成语音的 STOI(即 0.8)得分平均更高;并且人类听力测试表明,八位听众的平均单词正确率高于 97.5%。这些发现表明,所提出的基于深度学习的系统将来可能成为一种用有限的包络信息合成高度可懂度语音的潜在方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验