Arias-Vergara Tomás, Pérez-Toro Paula Andrea, Liu Xiaofeng, Xing Fangxu, Stone Maureen, Zhuo Jiachen, Prince Jerry L, Schuster Maria, Nöth Elmar, Woo Jonghye, Maier Andreas
Pattern Recognition Lab. Friedrich-Alexander University, Erlangen, Germany.
Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.
Interspeech. 2024 Sep;2024:927-931. doi: 10.21437/interspeech.2024-2236.
Magnetic Resonance Imaging (MRI) allows analyzing speech production by capturing high-resolution images of the dynamic processes in the vocal tract. In clinical applications, combining MRI with synchronized speech recordings leads to improved patient outcomes, especially if a phonological-based approach is used for assessment. However, when audio signals are unavailable, the recognition accuracy of sounds is decreased when using only MRI data. We propose a contrastive learning approach to improve the detection of phonological classes from MRI data when acoustic signals are not available at inference time. We demonstrate that frame-wise recognition of phonological classes improves from an f1 of 0.74 to 0.85 when the contrastive loss approach is implemented. Furthermore, we show the utility of our approach in the clinical application of using such phonological classes to assess speech disorders in patients with tongue cancer, yielding promising results in the recognition task.
磁共振成像(MRI)通过捕获声道动态过程的高分辨率图像,能够分析言语产生过程。在临床应用中,将MRI与同步语音记录相结合可改善患者预后,特别是在使用基于音系学的方法进行评估时。然而,当没有音频信号时,仅使用MRI数据时声音的识别准确率会降低。我们提出一种对比学习方法,以在推理时没有声学信号的情况下,提高从MRI数据中检测音系类别的能力。我们证明,当实施对比损失方法时,音系类别的逐帧识别f1值从0.74提高到了0.85。此外,我们展示了我们的方法在临床应用中的效用,即使用此类音系类别来评估舌癌患者的言语障碍,在识别任务中取得了有希望的结果。