Suppr超能文献

使用协议自适应堆叠迁移学习U-NET模型的动态语音磁共振成像中的自动多发音器分割

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.

作者信息

Erattakulangara Subin, Kelat Karthika, Meyer David, Priya Sarv, Lingala Sajan Goud

机构信息

Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA.

Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USA.

出版信息

Bioengineering (Basel). 2023 May 22;10(5):623. doi: 10.3390/bioengineering10050623.

Abstract

Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.

摘要

动态磁共振成像已成为研究言语产生过程中上呼吸道功能的一种强大方法。分析声道空域的变化,包括软组织发音器官(如舌头和软腭)的位置,有助于我们更好地理解言语产生过程。基于稀疏采样和约束重建的各种快速言语MRI协议的出现,使得能够创建每秒80 - 100帧图像的动态言语MRI数据集。在本文中,我们提出了一种堆叠式迁移学习U-NET模型,用于在动态言语MRI的二维正中矢状切片中分割变形的声道。我们的方法利用了(a)低层次和中间层次特征以及(b)高层次特征。低层次和中间层次特征来自于在标记的开源脑肿瘤MR和肺部CT数据集以及内部气道标记数据集上预训练的模型。高层次特征来自于标记的特定协议MR图像。我们的方法在分割动态数据集方面的适用性在从三种快速言语MRI协议获取的数据中得到了证明:协议1:基于3T的径向采集方案,结合非线性时间正则化,受试者说的是法语语音样本;协议2:基于1.5T的均匀密度螺旋采集方案,结合时间有限差分(FD)稀疏正则化,受试者说的是流利的英语语音样本;协议3:基于3T的可变密度螺旋采集方案,结合流形正则化,受试者说的是国际音标(IPA)中的各种语音样本。我们方法得到的分割结果与专家用户(一名嗓音专家)以及未进行迁移学习的传统U-NET模型的分割结果进行了比较。来自第二名专家用户(一名放射科医生)的分割结果用作真实参考。使用定量的DICE相似性度量、豪斯多夫距离度量和分割计数度量进行评估。这种方法仅用少量特定协议的图像(例如大约20张图像)就成功适用于不同的言语MRI协议,并提供了与专家类似的准确分割结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d900/10215398/cad2aaaccec5/bioengineering-10-00623-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验