Shinozaki Takahiro, Ostendorf Mari, Atlas Les
Department of Electrical Engineering, University of Washington, Seattle, WA 98195-2500, USA.
J Acoust Soc Am. 2009 Sep;126(3):1500-10. doi: 10.1121/1.3183593.
Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.
说话风格的差异与或多或少的频谱变异性以及不同的调制特性相关。某些风格(例如,自发言语和面向婴儿的言语)中更大的变异性给识别带来了挑战,但也可能为学习更强大的模型提供了机会,先前的工作证明了这一点,并且儿童语言习得研究也激发了这一观点。为了研究这种可能性,这项工作提出了一种表征说话风格的新方法(调制谱),在这个空间中研究自发、朗读、面向成人和面向婴儿的风格,并进行了风格检测和采样的试点实验,以改进语音识别器训练。通过将调制谱与标准音高和能量变化相结合,提高了说话风格分类。在一个小词汇量对话语音识别任务上的语音识别实验表明,使用少量数据进行训练的采样方法受益于这些新特征。