Wouters Johan, Macon Michael W
Center for Spoken Language Understanding, OGI School of Science and Engineering, Oregon Health & Science University, Beaverton 97006, USA.
J Acoust Soc Am. 2002 Jan;111(1 Pt 1):428-38. doi: 10.1121/1.1428263.
In Paper I [J. Wouters and M. Macon, J. Acoust. Soc. Am. 111, 417-427 (2002)], the effects of prosodic factors on the spectral rate of change of phoneme transitions were analyzed for a balanced speech corpus. The results showed that the spectral rate of change, defined as the root-mean-square of the first three formant slopes, increased with linguistic prominence, i.e., in stressed syllables, in accented words, in sentence-medial words, and in clearly articulated speech. In the present paper, an initial approach is described to integrate the results of Paper I in a concatenative synthesis framework. The target spectral rate of change of acoustic units is predicted based on the prosodic structure of utterances to be synthesized. Then, the spectral shape of the acoustic units is modified according to the predicted spectral rate of change. Experiments show that the proposed approach provides control over the degree of articulation of acoustic units, and improves the naturalness and intelligibility of concatenated speech in comparison to standard concatenation methods.