用于估计基于克拉特（Klatt）和HLSyn共振峰的语音合成器输入参数的遗传算法。

Genetic algorithm to estimate the input parameters of Klatt and HLSyn formant-based speech synthesizers.

作者信息

Araújo Fabíola, Filho José, Klautau Aldebaro

机构信息

Signal Processing Laboratory (LaPS) - Federal University of Pará, Rua Augusto Corrêa 01, Belém, PA, Brazil.

出版信息

Biosystems. 2016 Dec;150:190-193. doi: 10.1016/j.biosystems.2016.10.002. Epub 2016 Oct 18.

DOI:10.1016/j.biosystems.2016.10.002

PMID:27769749

Abstract

Voice imitation basically consists in estimating a synthesizer's input parameters to mimic a target speech signal. This is a difficult inverse problem because the mapping is time-varying, non-linear and from many to one. It typically requires considerable amount of time to be done manually. This work presents the evolution of a system based on a genetic algorithm (GA) to automatically estimate the input parameters of the Klatt and HLSyn formant synthesizers using an analysis-by-synthesis process. Results are presented for natural (human-generated) speech for three male speakers. The results obtained with the GA-based system outperform those obtained with the baseline Winsnoori with respect to four objective figures of merit and a subjective test. The GA with Klatt synthesizer generated similar voices to the target and the subjective tests indicate an improvement in the quality of the synthetic voices when compared to the ones produced by the baseline.

摘要

语音模仿主要在于估计合成器的输入参数，以模仿目标语音信号。这是一个困难的逆问题，因为映射是时变的、非线性的且是多对一的。手动完成通常需要相当长的时间。这项工作展示了一个基于遗传算法（GA）的系统的演进，该系统使用通过合成进行分析的过程自动估计克拉特（Klatt）和HLSyn共振峰合成器的输入参数。给出了三位男性说话者自然（人类生成）语音的结果。基于GA的系统获得的结果在四个客观品质因数和一项主观测试方面优于基于基线Winsnoori获得的结果。使用克拉特合成器的GA生成的声音与目标声音相似，主观测试表明，与基线生成的合成语音相比，合成语音的质量有所提高。