Mathad Vikram C, Liss Julie M, Chapman Kathy, Scherer Nancy, Berisha Visar
zapr media labs, Bangalore, India, 560016.
College of Health Solutions, Arizona State University, Tempe, AZ-85287.
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:86-95. doi: 10.1109/taslp.2022.3209937. Epub 2022 Oct 10.
Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.
辅音-元音(CV)过渡区域的频谱-时间动态被认为能提供与发音相关的可靠线索。在这项工作中,我们通过分析在元音起始处分割出的CV过渡部分,提出了一种精确发音的客观度量方法,称为客观发音度量(OAM)。OAM是基于一个卷积神经网络的后验概率得出的,该网络经过预训练,以CV区域作为输入来对不同辅音进行分类。我们证明,在多种情况下,OAM与感知度量相关,这些情况包括:(a)成人构音障碍语音,(b)唇腭裂儿童的语音,以及(c)以普通话和西班牙语为母语的英语带口音语音数据库。