基于 MRI 数据的两层多说话人建模方法：说话人发音可变性的特征描述。

Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data.

机构信息

Clinic for Phoniatrics, Pedaudiology & Communication Disorders, University Hospital and Medical Faculty of the RWTH Aachen University, Aachen, Germany.

Université Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France.

出版信息

J Acoust Soc Am. 2019 Apr;145(4):2149. doi: 10.1121/1.5096631.

DOI:10.1121/1.5096631

PMID:31046321

Abstract

Speech communication relies on articulatory and acoustic codes shared between speakers and listeners despite inter-individual differences in morphology and idiosyncratic articulatory strategies. This study addresses the long-standing problem of characterizing and modelling speaker-independent articulatory strategies and inter-speaker articulatory variability. It explores a multi-speaker modelling approach based on two levels: statistically-based linear articulatory models, which capture the speaker-specific articulatory variability on the one hand, are in turn controlled by a speaker model, which captures the inter-speaker variability on the other hand. A low dimensionality speaker model is obtained by taking advantage of the inter-speaker correlations between morphology and strategy. To validate this approach, contours of the vocal tract articulators were manually segmented on midsagittal MRI data recorded from 11 French speakers uttering 62 vowels and consonants. Using these contours, multi-speaker models with 14 articulatory components and two morphology and strategy components led to overall variance explanations of 66%-69% and root-mean-square errors of 0.36-0.38 cm obtained in leave-one-out procedure over the speakers. Results suggest that inter-speaker variability is more related to the morphology than to the idiosyncratic strategies and illustrate the adaptation of the articulatory components to the morphology.

摘要

言语交际依赖于说话者和听话者之间共享的发音和声学代码，尽管在形态和独特的发音策略方面存在个体间差异。本研究解决了长期以来的问题，即描述和建模独立于说话者的发音策略和说话者间发音可变性。它探索了一种基于两个层次的多说话者建模方法：一方面基于统计学的线性发音模型，一方面捕获说话者特有的发音可变性，另一方面受说话者模型控制，另一方面捕获说话者间的可变性。通过利用形态和策略之间的说话者间相关性，获得了低维说话者模型。为了验证这种方法，利用从 11 位法国说话者发出的 62 个元音和辅音的中矢状面 MRI 数据手动分割声道发音器官的轮廓。使用这些轮廓，具有 14 个发音组件和两个形态和策略组件的多说话者模型导致在说话者的留一法过程中总体方差解释率为 66%-69%，均方根误差为 0.36-0.38cm。结果表明，说话者间的可变性与形态的关系更为密切，而与独特的策略的关系则不太密切，并说明了发音组件对形态的适应性。