Childers D G, Wong C F
Department of Electrical Engineering, University of Florida, Gainesville 32611-2024.
IEEE Trans Biomed Eng. 1994 Jul;41(7):663-71. doi: 10.1109/10.301733.
The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizer's ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: 1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and 2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model. This glottal source model controls: 1) the skewness of the glottal pulse, and 2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders.
可懂度和自然度。目前,合成语音可能具有很高的可懂度,但听起来往往不自然。语音可懂度取决于合成器再现共振峰、共振峰带宽和共振峰过渡的能力,而语音自然度则被认为取决于浊音和清音的激励波形特征。浊音可以由一系列特定形状的声门脉冲准周期性地激发声道滤波器产生。通常假设声门源和声道滤波器是线性可分离的,且不相互作用。然而,这个假设往往是无效的,因为据观察,在自然语音中可能会发生明显的源 - 声道相互作用。先前的语音合成实验表明,在合成过程中模拟源 - 声道相互作用时,合成语音的自然度确实会提高。本文的目的有两个:1)提出一种自动测量浊音语音源 - 声道相互作用的算法,2)提出一个简单的语音产生模型,将源 - 声道相互作用纳入声门源模型。这个声门源模型控制:1)声门脉冲的偏度,2)叠加在声门脉冲上的第一共振峰波纹的量。本文结果的一个主要应用是对嗓音障碍进行建模。