基于MRI数据的对比学习方法用于评估舌癌患者的语音清晰度

Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data.

作者信息

Arias-Vergara Tomás, Pérez-Toro Paula Andrea, Liu Xiaofeng, Xing Fangxu, Stone Maureen, Zhuo Jiachen, Prince Jerry L, Schuster Maria, Nöth Elmar, Woo Jonghye, Maier Andreas

机构信息

Pattern Recognition Lab. Friedrich-Alexander University, Erlangen, Germany.

Massachusetts General Hospital - Harvard Medical School, Boston, MA, USA.

出版信息

Interspeech. 2024 Sep;2024:927-931. doi: 10.21437/interspeech.2024-2236.

DOI:10.21437/interspeech.2024-2236

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11671147/

Abstract

Magnetic Resonance Imaging (MRI) allows analyzing speech production by capturing high-resolution images of the dynamic processes in the vocal tract. In clinical applications, combining MRI with synchronized speech recordings leads to improved patient outcomes, especially if a phonological-based approach is used for assessment. However, when audio signals are unavailable, the recognition accuracy of sounds is decreased when using only MRI data. We propose a contrastive learning approach to improve the detection of phonological classes from MRI data when acoustic signals are not available at inference time. We demonstrate that frame-wise recognition of phonological classes improves from an f1 of 0.74 to 0.85 when the contrastive loss approach is implemented. Furthermore, we show the utility of our approach in the clinical application of using such phonological classes to assess speech disorders in patients with tongue cancer, yielding promising results in the recognition task.

摘要

磁共振成像（MRI）通过捕获声道动态过程的高分辨率图像，能够分析言语产生过程。在临床应用中，将MRI与同步语音记录相结合可改善患者预后，特别是在使用基于音系学的方法进行评估时。然而，当没有音频信号时，仅使用MRI数据时声音的识别准确率会降低。我们提出一种对比学习方法，以在推理时没有声学信号的情况下，提高从MRI数据中检测音系类别的能力。我们证明，当实施对比损失方法时，音系类别的逐帧识别f1值从0.74提高到了0.85。此外，我们展示了我们的方法在临床应用中的效用，即使用此类音系类别来评估舌癌患者的言语障碍，在识别任务中取得了有希望的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/11671147/d99fbc2fb052/nihms-2002850-f0001.jpg

相似文献

1

Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data.基于MRI数据的对比学习方法用于评估舌癌患者的语音清晰度

Interspeech. 2024 Sep;2024:927-931. doi: 10.21437/interspeech.2024-2236.

2

Spatial-aware contrastive learning for cross-domain medical image registration.用于跨域医学图像配准的空间感知对比学习

Med Phys. 2024 Nov;51(11):8141-8150. doi: 10.1002/mp.17311. Epub 2024 Jul 19.

3

3D dynamic MRI of the vocal tract during natural speech.自然言语状态下声道的 3D 动态 MRI

Magn Reson Med. 2019 Mar;81(3):1511-1520. doi: 10.1002/mrm.27570. Epub 2018 Nov 3.

4

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像（MRI）中进行脑肿瘤分割与检测

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

5

Overtone focusing in biphonic tuvan throat singing.双音图瓦喉音的泛音聚焦。

Elife. 2020 Feb 17;9:e50476. doi: 10.7554/eLife.50476.

6

Reducing annotation burden in MR: A novel MR-contrast guided contrastive learning approach for image segmentation.减少磁共振成像中的标注负担：一种新的基于磁共振对比引导的对比学习方法用于图像分割。

Med Phys. 2024 Apr;51(4):2707-2720. doi: 10.1002/mp.16820. Epub 2023 Nov 13.

7

Bidirectional feature matching based on deep pairwise contrastive learning for multiparametric MRI image synthesis.基于深度成对对比学习的双向特征匹配在多参数 MRI 图像合成中的应用。

Phys Med Biol. 2023 Jun 15;68(12). doi: 10.1088/1361-6560/acda78.

8

A fast and flexible MRI system for the study of dynamic vocal tract shaping.一种用于研究动态声道塑形的快速灵活的磁共振成像（MRI）系统。

Magn Reson Med. 2017 Jan;77(1):112-125. doi: 10.1002/mrm.26090. Epub 2016 Jan 17.

9

High-frame-rate full-vocal-tract 3D dynamic speech imaging.高帧率全声道三维动态语音成像

Magn Reson Med. 2017 Apr;77(4):1619-1629. doi: 10.1002/mrm.26248. Epub 2016 Apr 21.

10

Improved imaging of lingual articulation using real-time multislice MRI.使用实时多层 MRI 改善舌位成像。

J Magn Reson Imaging. 2012 Apr;35(4):943-8. doi: 10.1002/jmri.23510. Epub 2011 Nov 29.

本文引用的文献

1

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

2

Speech Outcome in Oral Cancer Patients - Pre- and Post-operative Evaluation: A Cross-sectional Study.口腔癌患者的言语结果——术前和术后评估：一项横断面研究。

Indian J Palliat Care. 2016 Oct-Dec;22(4):499-503. doi: 10.4103/0973-1075.191858.

3

SEMI-AUTOMATIC SEGMENTATION OF THE TONGUE FOR 3D MOTION ANALYSIS WITH DYNAMIC MRI.利用动态磁共振成像进行三维运动分析的舌部半自动分割

Proc IEEE Int Symp Biomed Imaging. 2013 Dec 31;2013:1465-1468. doi: 10.1109/ISBI.2013.6556811.

4

Speech production after glossectomy: methodological aspects.舌切除术后的言语产生：方法学方面

Clin Linguist Phon. 2014 Apr;28(4):241-56. doi: 10.3109/02699206.2013.802015. Epub 2013 Jul 9.

5

Rehabilitation of word deafness due to auditory analysis disorder.因听觉分析障碍导致的词聋症的康复治疗。

Brain Inj. 2007 Oct;21(11):1165-74. doi: 10.1080/02699050701559186.

6

Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing.利用谐波相位（HARP）处理技术从标记电影磁共振成像中测量舌头运动。

J Acoust Soc Am. 2007 Jan;121(1):491-504. doi: 10.1121/1.2363926.

7

Speech intelligibility after glossectomy and speech rehabilitation.舌切除术后的言语清晰度及言语康复

Arch Otolaryngol Head Neck Surg. 2001 Jul;127(7):877-83.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验