使用协议自适应堆叠迁移学习U-NET模型的动态语音磁共振成像中的自动多发音器分割

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.

作者信息

Erattakulangara Subin, Kelat Karthika, Meyer David, Priya Sarv, Lingala Sajan Goud

机构信息

Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA.

Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USA.

出版信息

Bioengineering (Basel). 2023 May 22;10(5):623. doi: 10.3390/bioengineering10050623.

DOI:10.3390/bioengineering10050623

PMID:37237693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10215398/

Abstract

Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.

摘要

动态磁共振成像已成为研究言语产生过程中上呼吸道功能的一种强大方法。分析声道空域的变化，包括软组织发音器官（如舌头和软腭）的位置，有助于我们更好地理解言语产生过程。基于稀疏采样和约束重建的各种快速言语MRI协议的出现，使得能够创建每秒80 - 100帧图像的动态言语MRI数据集。在本文中，我们提出了一种堆叠式迁移学习U-NET模型，用于在动态言语MRI的二维正中矢状切片中分割变形的声道。我们的方法利用了（a）低层次和中间层次特征以及（b）高层次特征。低层次和中间层次特征来自于在标记的开源脑肿瘤MR和肺部CT数据集以及内部气道标记数据集上预训练的模型。高层次特征来自于标记的特定协议MR图像。我们的方法在分割动态数据集方面的适用性在从三种快速言语MRI协议获取的数据中得到了证明：协议1：基于3T的径向采集方案，结合非线性时间正则化，受试者说的是法语语音样本；协议2：基于1.5T的均匀密度螺旋采集方案，结合时间有限差分（FD）稀疏正则化，受试者说的是流利的英语语音样本；协议3：基于3T的可变密度螺旋采集方案，结合流形正则化，受试者说的是国际音标（IPA）中的各种语音样本。我们方法得到的分割结果与专家用户（一名嗓音专家）以及未进行迁移学习的传统U-NET模型的分割结果进行了比较。来自第二名专家用户（一名放射科医生）的分割结果用作真实参考。使用定量的DICE相似性度量、豪斯多夫距离度量和分割计数度量进行评估。这种方法仅用少量特定协议的图像（例如大约20张图像）就成功适用于不同的言语MRI协议，并提供了与专家类似的准确分割结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d900/10215398/cad2aaaccec5/bioengineering-10-00623-g001.jpg

相似文献

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.使用协议自适应堆叠迁移学习U-NET模型的动态语音磁共振成像中的自动多发音器分割

Bioengineering (Basel). 2023 May 22;10(5):623. doi: 10.3390/bioengineering10050623.

Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech.基于深度学习的语音实时磁共振图像中声道和发音器官的分割。

Comput Methods Programs Biomed. 2021 Jan;198:105814. doi: 10.1016/j.cmpb.2020.105814. Epub 2020 Oct 26.

Prospectively accelerated dynamic speech magnetic resonance imaging at 3 T using a self-navigated spiral-based manifold regularized scheme.前瞻性加速动态磁共振成像在 3T 场强下使用基于自导航螺旋流形正则化方案。

NMR Biomed. 2024 Aug;37(8):e5135. doi: 10.1002/nbm.5135. Epub 2024 Mar 5.

Real-time speech MRI datasets with corresponding articulator ground-truth segmentations.带有相应发音器官真实分割的实时语音 MRI 数据集。

Sci Data. 2023 Dec 2;10(1):860. doi: 10.1038/s41597-023-02766-z.

Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging.实时磁共振成像中的声道发音器官自动分割。

Comput Methods Programs Biomed. 2024 Jan;243:107907. doi: 10.1016/j.cmpb.2023.107907. Epub 2023 Nov 10.

Automatic upper airway segmentation in static and dynamic MRI via anatomy-guided convolutional neural networks.基于解剖结构引导卷积神经网络的静态和动态 MRI 下自动上气道分割。

Med Phys. 2022 Jan;49(1):324-342. doi: 10.1002/mp.15345. Epub 2021 Dec 2.

Using deep learning to segment breast and fibroglandular tissue in MRI volumes.利用深度学习对磁共振成像（MRI）容积中的乳腺和纤维腺组织进行分割。

Med Phys. 2017 Feb;44(2):533-546. doi: 10.1002/mp.12079.

Stability of conventional and machine learning-based tumor auto-segmentation techniques using undersampled dynamic radial bSSFP acquisitions on a 0.35 T hybrid MR-linac system.在0.35T混合磁共振直线加速器系统上使用欠采样动态径向bSSFP采集的传统和基于机器学习的肿瘤自动分割技术的稳定性

Med Phys. 2021 Feb;48(2):587-596. doi: 10.1002/mp.14659. Epub 2021 Jan 9.

Development of U-Net Breast Density Segmentation Method for Fat-Sat MR Images Using Transfer Learning Based on Non-Fat-Sat Model.基于非脂肪饱和模型的迁移学习的脂肪饱和磁共振图像 U-Net 乳腺密度分割方法的开发。

J Digit Imaging. 2021 Aug;34(4):877-887. doi: 10.1007/s10278-021-00472-z. Epub 2021 Jul 9.

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech.一种基于分割信息的深度学习框架，用于在语音过程中配准声道的动态二维磁共振图像。

Biomed Signal Process Control. 2023 Feb;80:104290. doi: 10.1016/j.bspc.2022.104290.

引用本文的文献

Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks.用于基于深度学习从3D MRI进行自动分割的开源手动标注声道数据库：2D和3D卷积网络与Transformer网络的基准测试

J Voice. 2025 Mar 5. doi: 10.1016/j.jvoice.2025.02.026.

Estimating Palatal and Pharyngeal Muscle Contraction in Hindi Syllable Pronunciation using Computational Modeling.使用计算模型估计印地语音节发音时的腭部和咽部肌肉收缩情况。

Indian J Plast Surg. 2024 Jul 18;57(Suppl 1):S24-S29. doi: 10.1055/s-0044-1788591. eCollection 2024 Dec.

Multi-label deep learning for comprehensive optic nerve head segmentation through data of fundus images.通过眼底图像数据进行多标签深度学习以实现全面的视神经乳头分割

Heliyon. 2024 Sep 1;10(18):e36996. doi: 10.1016/j.heliyon.2024.e36996. eCollection 2024 Sep 30.

A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method.基于集成方法的声带分割和障碍分类的机器学习方法。

Sci Rep. 2024 Jun 23;14(1):14435. doi: 10.1038/s41598-024-64987-5.

本文引用的文献

Medical image analysis on left atrial LGE MRI for atrial fibrillation studies: A review.用于房颤研究的左心房 LGE MRI 的医学图像分析：综述。

Med Image Anal. 2022 Apr;77:102360. doi: 10.1016/j.media.2022.102360. Epub 2022 Jan 29.

AtrialJSQnet: A New framework for joint segmentation and quantification of left atrium and scars incorporating spatial and shape information.心房 JSQnet：一种新的联合分割和量化左心房和疤痕的框架，结合了空间和形状信息。

Med Image Anal. 2022 Feb;76:102303. doi: 10.1016/j.media.2021.102303. Epub 2021 Nov 16.

Med Phys. 2022 Jan;49(1):324-342. doi: 10.1002/mp.15345. Epub 2021 Dec 2.

Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers.健康法裔讲者的实时 2D 和静态 3D MRI 的多模态数据集。

Sci Data. 2021 Oct 1;8(1):258. doi: 10.1038/s41597-021-01041-3.

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

Comput Methods Programs Biomed. 2021 Jan;198:105814. doi: 10.1016/j.cmpb.2020.105814. Epub 2020 Oct 26.

CT images with expert manual contours of thoracic cancer for benchmarking auto-segmentation accuracy.带有胸段癌专家手动轮廓的CT图像，用于基准测试自动分割精度。

Med Phys. 2020 Jul;47(7):3250-3255. doi: 10.1002/mp.14107. Epub 2020 Mar 28.

Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved?深度学习技术在自动 MRI 心脏多结构分割与诊断中的应用：问题是否已解决？

IEEE Trans Med Imaging. 2018 Nov;37(11):2514-2525. doi: 10.1109/TMI.2018.2837502. Epub 2018 May 17.

Assessment of velopharyngeal function with dual-planar high-resolution real-time spiral dynamic MRI.双层面高分辨率实时螺旋动态 MRI 评估腭咽功能。

Magn Reson Med. 2018 Oct;80(4):1467-1474. doi: 10.1002/mrm.27139. Epub 2018 Mar 5.

Analysis of speech and tongue motion in normal and post-glossectomy speaker using cine MRI.使用电影磁共振成像分析正常和舌切除术后说话者的语音和舌头运动。

J Appl Oral Sci. 2016 Sep-Oct;24(5):472-480. doi: 10.1590/1678-775720150421.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用协议自适应堆叠迁移学习U-NET模型的动态语音磁共振成像中的自动多发音器分割

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献