O'Shaughnessy D
J Acoust Soc Am. 1984 Dec;76(6):1664-72. doi: 10.1121/1.391613.
Understanding how the durations of acoustic segments vary in natural language can lead to more intelligible synthetic speech, and to improved automatic recognition. Toward this goal, a 111-word French paragraph was read by 29 native speakers from France. Measured durations of acoustic segments were significantly shorter than those in earlier studies of stressed words in French sentences read from a list. Previously recognized trends (short schwa vowels and function words; long unvoiced fricatives, nasalized vowels, and prepausal syllables) are confirmed and quantitative results are given. Vowels were longer preceding voiced fricatives (but not prior to/r/), and were also longer at sentence-internal pauses than at the end of a sentence. Standard deviations of acoustic segment durations (at fixed positions in the paragraph) across speakers averaged less than 25% in most cases. The exceptional, larger deviations occurred primarily in segments adjacent to pauses. Speaking rate variations could account for only one-sixth of the deviations, the rest being attributable to relatively free variation across speakers. A generative model of French durations, suitable for synthesis-by-rule, is presented, and applications to automatic recognition are discussed.
了解自然语言中语音片段的时长如何变化,有助于生成更易懂的合成语音,并改进自动识别。为实现这一目标,29名来自法国的母语人士朗读了一段111个单词的法语段落。所测量的语音片段时长明显短于早期对从列表中读出的法语句子中重音词的研究结果。先前公认的趋势(短元音弱化和功能词;长清擦音、鼻化元音和句末前音节)得到了证实,并给出了定量结果。元音在浊擦音之前更长(但在/r/之前并非如此),并且在句子内部停顿处也比在句子结尾处更长。在大多数情况下,跨说话者的语音片段时长(在段落中的固定位置)的标准差平均小于25%。异常的较大偏差主要出现在与停顿相邻的片段中。语速变化只能解释六分之一的偏差,其余偏差归因于说话者之间相对自由的变化。本文提出了一个适用于按规则合成的法语时长生成模型,并讨论了其在自动识别中的应用。