Childers D G, Hu H T
Department of Electrical Engineering, University of Florida, Gainesville 32611-2024.
J Acoust Soc Am. 1994 Oct;96(4):2026-36. doi: 10.1121/1.411319.
This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation. The coefficients of the polynomial model form a vector that represents the glottal excitation waveform for one pitch period. A glottal excitation code book with 32 entries for voiced excitation is designed and trained using two sentences spoken by different speakers. The purpose for using this approach is to demonstrate that quantization of the glottal excitation waveform does not significantly degrade the quality of speech synthesized with a glottal excitation linear predictive (GELP) synthesizer. This implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speed sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. It is conjectured that the glottal excitation codebook approach could provide a mechanism for quantitatively comparing the differences in glottal excitation codebooks for male and female speakers and for speakers with vocal disorders and for speakers with different voice types such as breathy and vocal fry voices. Conceivably, one could also convert the voice of a speaker with one voice type, e.g., breathy, to the voice of a speaker with another voice type, e.g., vocal fry, by synthesizing speech using the vocal tract LP parameters for the speaker with the breathy voice excited by the glottal excitation codebook trained for vocal fry.
本文描述了一种线性预测(LP)语音合成程序,该程序使用六阶多项式波形对声门激励进行建模,从而重新合成语音。多项式模型的系数形成一个向量,该向量表示一个基音周期的声门激励波形。设计并训练了一个具有32个浊音激励条目的声门激励码本,使用了不同说话者说出的两个句子。使用这种方法的目的是证明声门激励波形的量化不会显著降低使用声门激励线性预测(GELP)合成器合成的语音质量。LP合成器的这种实现方式是模仿基音激励LP语音合成器和码激励线性预测(CELP)语音编码器设计的。除了声门激励码本外,我们还使用了一个具有256个条目的随机码本用于清音噪声激励。文中描述了构建这两个码本的分析技术。高质量重新合成语音的GELP合成器为语音科学家提供了一种简单的语音合成程序,该程序使用既定的分析技术,能够再现所有语音声音,并且其激励模型波形与声门气流的导数和余量的积分相关。据推测,声门激励码本方法可以提供一种机制,用于定量比较男性和女性说话者、患有嗓音障碍的说话者以及具有不同嗓音类型(如呼吸声和喉塞音)的说话者的声门激励码本之间的差异。可以想象,通过使用为喉塞音训练的声门激励码本激励具有呼吸声的说话者的声道LP参数来合成语音,还可以将具有一种嗓音类型(如呼吸声)的说话者的声音转换为具有另一种嗓音类型(如喉塞音)的说话者的声音。