Suppr超能文献

使用预训练的长短期记忆神经网络改进人工语音的后滤波

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks.

作者信息

Coto-Jiménez Marvin

机构信息

Escuela de Ingeniería Eléctrica, Universidad de Costa Rica, San José 11501-2060, Costa Rica.

出版信息

Biomimetics (Basel). 2019 May 28;4(2):39. doi: 10.3390/biomimetics4020039.

Abstract

Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.

摘要

几位研究人员已经考虑过基于深度学习的后置滤波器,以提高统计参数语音合成的质量,该滤波器将合成语音映射到自然语音,分别考虑不同参数并试图缩小它们之间的差距。长短期记忆(LSTM)神经网络已成功应用于此目的,但在结果和过程本身仍有许多方面需要改进。在本文中,我们介绍了一种新的LSTM预训练方法,目的是以更有效的方式提高合成语音的质量,特别是在频谱方面。我们的方法从一个LSTM网络的自联想训练开始,该网络用作后置滤波器的初始化。我们展示了这种初始化对于增强合成语音的梅尔频率倒谱参数的优势。结果表明,与网络的常见随机初始化方法相比,这种初始化在大多数情况下成功地在增强统计参数语音频谱方面取得了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/203f/6630405/e0fba279d9c1/biomimetics-04-00039-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验