汉语语音的无监督联合韵律标注与建模

Chiang Chen-Yu, Chen Sin-Horng, Yu Hsiu-Min, Wang Yih-Ru

Department of Communication Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, Republic of China.

J Acoust Soc Am. 2009 Feb;125(2):1164-83. doi: 10.1121/1.3056559.

An unsupervised joint prosody labeling and modeling method for Mandarin speech is proposed, a new scheme intended to construct statistical prosodic models and to label prosodic tags consistently for Mandarin speech. Two types of prosodic tags are determined by four prosodic models designed to illustrate the hierarchy of Mandarin prosody: the break of a syllable juncture to demarcate prosodic constituents and the prosodic state to represent any prosodic domain's pitch-level variation resulting from its upper-layered prosodic constituents' influences. The performance of the proposed method was evaluated using an unlabeled read-speech corpus articulated by an experienced female announcer. Experimental results showed that the estimated parameters of the four prosodic models were able to explore and describe the structures and patterns of Mandarin prosody. Besides, certain corresponding relationships between the break indices labeled and the associated words were found, and manifested the connections between prosodic and linguistic parameters, a finding further verifying the capability of the method presented. Finally, a quantitative comparison in labeling results between the proposed method and human labelers indicated that the former was more consistent and discriminative than the latter in prosodic feature distributions, a merit of the method developed here on the applications of prosody modeling.

提出了一种用于汉语语音的无监督联合韵律标注与建模方法，这是一种旨在构建统计韵律模型并为汉语语音一致地标注韵律标签的新方案。通过设计四种韵律模型来确定两种韵律标签，这些模型旨在说明汉语韵律的层次结构：音节分界处的停顿用于划分韵律成分，韵律状态用于表示任何韵律域由于其上层韵律成分的影响而产生的音高变化。使用由一位经验丰富的女播音员朗读的未标注语音语料库对所提方法的性能进行了评估。实验结果表明，四种韵律模型的估计参数能够探索和描述汉语韵律的结构和模式。此外，发现了所标注的停顿索引与相关单词之间的某些对应关系，这体现了韵律参数与语言参数之间的联系，这一发现进一步验证了所提方法的能力。最后，对所提方法与人工标注者的标注结果进行的定量比较表明，在所开发方法在韵律建模应用方面，前者在韵律特征分布上比后者更具一致性和区分性。