• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过带标记的磁共振成像利用Transformer从言语中的舌运动合成音频

Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.

作者信息

Liu Xiaofeng, Xing Fangxu, Prince Jerry L, Stone Maureen, Fakhri Georges El, Woo Jonghye

机构信息

Gordon Center for Medical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114 USA.

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA.

出版信息

Proc SPIE Int Soc Opt Eng. 2023 Feb;12464. doi: 10.1117/12.2653345. Epub 2023 Apr 3.

DOI:10.1117/12.2653345
PMID:38009135
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10669779/
Abstract

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

摘要

研究从标记磁共振成像测量得到的舌头内部组织点运动与口咽肌肉变形之间的关系,以及与可理解语音之间的关系,有助于推进言语运动控制理论,并为言语相关障碍开发新的治疗方法。然而,阐明这两种信息来源之间的关系具有挑战性,部分原因是时空运动场(即4D运动场)和一维音频波形之间的数据结构存在差异。在这项工作中,我们提出了一种高效的编码器-解码器翻译网络,用于通过二维频谱图作为音频数据的替代物来探索4D运动场中固有的预测信息。具体而言,我们的编码器基于3D卷积空间建模和基于Transformer的时间建模。提取的特征由非对称二维卷积解码器处理,以生成与4D运动场相对应的频谱图。此外,我们将生成对抗训练方法纳入我们的框架,以进一步提高我们生成的频谱图的合成质量。我们对63对运动序列和语音波形进行了实验,证明我们的框架能够从一系列运动场生成清晰的音频波形。因此,我们的框架有可能增进我们对这两种模态之间关系的理解,并为言语障碍治疗方法的开发提供信息。

相似文献

1
Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.通过带标记的磁共振成像利用Transformer从言语中的舌运动合成音频
Proc SPIE Int Soc Opt Eng. 2023 Feb;12464. doi: 10.1117/12.2653345. Epub 2023 Apr 3.
2
Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.通过自残差注意力引导的异构翻译器实现标记磁共振成像序列到音频合成
Med Image Comput Comput Assist Interv. 2022 Sep;13436:376-386. doi: 10.1007/978-3-031-16446-0_36. Epub 2022 Sep 17.
3
Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI.通过标记MRI的4D运动场跨模态翻译进行语音运动异常检测。
Proc SPIE Int Soc Opt Eng. 2024 Feb;12926. doi: 10.1117/12.3006874. Epub 2024 May 1.
4
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer.通过塑性变压器从标记磁共振成像和非负矩阵分解进行语音音频合成。
Med Image Comput Comput Assist Interv. 2023 Oct;14226:435-445. doi: 10.1007/978-3-031-43990-2_41. Epub 2023 Oct 1.
5
CMRI2SPEC: CINE MRI SEQUENCE TO SPECTROGRAM SYNTHESIS VIA A PAIRWISE HETEROGENEOUS TRANSLATOR.CMRI2SPEC:通过成对异构翻译器将电影磁共振成像序列合成到频谱图
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485. doi: 10.1109/icassp43922.2022.9746381. Epub 2022 Apr 27.
6
Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images.语音图谱:基于标记和电影磁共振成像的语音过程中4D舌运动的统计多模态图谱。
Comput Methods Biomech Biomed Eng Imaging Vis. 2019;7(4):361-373. doi: 10.1080/21681163.2017.1382393. Epub 2017 Oct 9.
7
Quantifying Velopharyngeal Motion Variation in Speech Sound Production Using an Audio-Informed Dynamic MRI Atlas.使用音频信息动态MRI图谱量化语音发声中的腭咽运动变化。
Proc SPIE Int Soc Opt Eng. 2023 Feb;12464. doi: 10.1117/12.2654082. Epub 2023 Apr 3.
8
A Four-dimensional Motion Field Atlas of the Tongue from Tagged and Cine Magnetic Resonance Imaging.基于标记和电影磁共振成像的舌部四维运动场图谱
Proc SPIE Int Soc Opt Eng. 2017;10133. doi: 10.1117/12.2254363. Epub 2017 Feb 24.
9
Conditional-Based Transformer Network With Learnable Queries for 4D Deformation Forecasting and Tracking.基于条件的 Transformer 网络,具有可学习的查询,用于 4D 变形预测和跟踪。
IEEE Trans Med Imaging. 2023 Jun;42(6):1603-1618. doi: 10.1109/TMI.2023.3234046. Epub 2023 Jun 1.
10
A generative adversarial network (GAN)-based technique for synthesizing realistic respiratory motion in the extended cardiac-torso (XCAT) phantoms.基于生成对抗网络(GAN)的技术,用于在扩展心脏体模(XCAT)中合成逼真的呼吸运动。
Phys Med Biol. 2021 May 31;66(11). doi: 10.1088/1361-6560/ac01b4.

引用本文的文献

1
Tagged-to-Cine MRI Sequence Synthesis via Light Spatial-Temporal Transformer.通过轻量级时空变换器实现标记到电影MRI序列合成
Med Image Comput Comput Assist Interv. 2024 Oct;15007:701-711. doi: 10.1007/978-3-031-72104-5_67. Epub 2024 Oct 3.
2
Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI.通过标记MRI的4D运动场跨模态翻译进行语音运动异常检测。
Proc SPIE Int Soc Opt Eng. 2024 Feb;12926. doi: 10.1117/12.3006874. Epub 2024 May 1.

本文引用的文献

1
CMRI2SPEC: CINE MRI SEQUENCE TO SPECTROGRAM SYNTHESIS VIA A PAIRWISE HETEROGENEOUS TRANSLATOR.CMRI2SPEC:通过成对异构翻译器将电影磁共振成像序列合成到频谱图
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485. doi: 10.1109/icassp43922.2022.9746381. Epub 2022 Apr 27.
2
DUAL-CYCLE CONSTRAINED BIJECTIVE VAE-GAN FOR TAGGED-TO-CINE MAGNETIC RESONANCE IMAGE SYNTHESIS.用于标记到电影磁共振图像合成的双循环约束双射变分自编码器-生成对抗网络
Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021. doi: 10.1109/isbi48211.2021.9433852. Epub 2021 May 25.
3
A deep joint sparse non-negative matrix factorization framework for identifying the common and subject-specific functional units of tongue motion during speech.
一种深度联合稀疏非负矩阵分解框架,用于识别言语中舌运动的共同和特定于主题的功能单元。
Med Image Anal. 2021 Aug;72:102131. doi: 10.1016/j.media.2021.102131. Epub 2021 Jun 12.
4
Automated interpretation of congenital heart disease from multi-view echocardiograms.多视图超声心动图中先天性心脏病的自动解读。
Med Image Anal. 2021 Apr;69:101942. doi: 10.1016/j.media.2020.101942. Epub 2020 Dec 26.
5
Phase Vector Incompressible Registration Algorithm for Motion Estimation From Tagged Magnetic Resonance Images.用于从标记磁共振图像进行运动估计的相向量不可压缩配准算法
IEEE Trans Med Imaging. 2017 Oct;36(10):2116-2128. doi: 10.1109/TMI.2017.2723021. Epub 2017 Jul 4.
6
3D tongue motion from tagged and cine MR images.来自标记和电影磁共振图像的三维舌运动
Med Image Comput Comput Assist Interv. 2013;16(Pt 3):41-8. doi: 10.1007/978-3-642-40760-4_6.
7
SEMI-AUTOMATIC SEGMENTATION OF THE TONGUE FOR 3D MOTION ANALYSIS WITH DYNAMIC MRI.利用动态磁共振成像进行三维运动分析的舌部半自动分割
Proc IEEE Int Symp Biomed Imaging. 2013 Dec 31;2013:1465-1468. doi: 10.1109/ISBI.2013.6556811.