Aryal Sandesh, Gutierrez-Osuna Ricardo
Department of Computer Science and Engineering, Texas A&M University, College Station, Texas 77843.
J Acoust Soc Am. 2015 Jan;137(1):433-46. doi: 10.1121/1.4904701.
This paper presents an articulatory synthesis method to transform utterances from a second language (L2) learner to appear as if they had been produced by the same speaker but with a native (L1) accent. The approach consists of building a probabilistic articulatory synthesizer (a mapping from articulators to acoustics) for the L2 speaker, then driving the model with articulatory gestures from a reference L1 speaker. To account for differences in the vocal tract of the two speakers, a Procrustes transform is used to bring their articulatory spaces into registration. In a series of listening tests, accent conversions were rated as being more intelligible and less accented than L2 utterances while preserving the voice identity of the L2 speaker. No significant effect was found between the intelligibility of accent-converted utterances and the proportion of phones outside the L2 inventory. Because the latter is a strong predictor of pronunciation variability in L2 speech, these results suggest that articulatory resynthesis can decouple those aspects of an utterance that are due to the speaker's physiology from those that are due to their linguistic gestures.
本文提出了一种发音合成方法,可将第二语言(L2)学习者的话语转换为听起来好像是由同一说话者发出,但带有母语(L1)口音的话语。该方法包括为L2说话者构建一个概率发音合成器(从发音器官到声学的映射),然后用来自参考L1说话者的发音手势驱动该模型。为了考虑两个说话者声道的差异,使用了普罗克汝斯忒斯变换来使他们的发音空间对齐。在一系列听力测试中,口音转换后的话语在保留L2说话者声音特征的同时,被评为比L2话语更易懂且口音更轻。在口音转换后的话语的可懂度与L2音素库之外的音素比例之间未发现显著影响。由于后者是L2语音发音变异性的一个强有力的预测指标,这些结果表明发音重新合成可以将话语中归因于说话者生理特征的那些方面与归因于其语言手势的那些方面分离开来。