• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI.通过标记MRI的4D运动场跨模态翻译进行语音运动异常检测。
Proc SPIE Int Soc Opt Eng. 2024 Feb;12926. doi: 10.1117/12.3006874. Epub 2024 May 1.
2
Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.通过带标记的磁共振成像利用Transformer从言语中的舌运动合成音频
Proc SPIE Int Soc Opt Eng. 2023 Feb;12464. doi: 10.1117/12.2653345. Epub 2023 Apr 3.
3
Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.通过自残差注意力引导的异构翻译器实现标记磁共振成像序列到音频合成
Med Image Comput Comput Assist Interv. 2022 Sep;13436:376-386. doi: 10.1007/978-3-031-16446-0_36. Epub 2022 Sep 17.
4
Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images.语音图谱:基于标记和电影磁共振成像的语音过程中4D舌运动的统计多模态图谱。
Comput Methods Biomech Biomed Eng Imaging Vis. 2019;7(4):361-373. doi: 10.1080/21681163.2017.1382393. Epub 2017 Oct 9.
5
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer.通过塑性变压器从标记磁共振成像和非负矩阵分解进行语音音频合成。
Med Image Comput Comput Assist Interv. 2023 Oct;14226:435-445. doi: 10.1007/978-3-031-43990-2_41. Epub 2023 Oct 1.
6
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
7
CMRI2SPEC: CINE MRI SEQUENCE TO SPECTROGRAM SYNTHESIS VIA A PAIRWISE HETEROGENEOUS TRANSLATOR.CMRI2SPEC:通过成对异构翻译器将电影磁共振成像序列合成到频谱图
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485. doi: 10.1109/icassp43922.2022.9746381. Epub 2022 Apr 27.
8
Articulatory Underpinnings of Reduced Acoustic-Phonetic Contrasts in Individuals With Amyotrophic Lateral Sclerosis.肌萎缩侧索硬化症患者的发音基础与声学语音对比减弱。
Am J Speech Lang Pathol. 2022 Sep 7;31(5):2022-2044. doi: 10.1044/2022_AJSLP-22-00046. Epub 2022 Aug 16.
9
Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data.利用发音运动数据识别接受喉部手术重建的个体所发出的低语语音。
Workshop Speech Lang Process Assist Technol. 2016 Sep;2016:80-86. doi: 10.21437/SLPAT.2016-14.
10
Lung tumor segmentation in 4D CT images using motion convolutional neural networks.使用运动卷积神经网络进行 4D CT 图像中的肺部肿瘤分割。
Med Phys. 2021 Nov;48(11):7141-7153. doi: 10.1002/mp.15204. Epub 2021 Sep 13.

本文引用的文献

1
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer.通过塑性变压器从标记磁共振成像和非负矩阵分解进行语音音频合成。
Med Image Comput Comput Assist Interv. 2023 Oct;14226:435-445. doi: 10.1007/978-3-031-43990-2_41. Epub 2023 Oct 1.
2
Synthesizing Audio from Tongue Motion During Speech Using Tagged MRI Via Transformer.通过带标记的磁共振成像利用Transformer从言语中的舌运动合成音频
Proc SPIE Int Soc Opt Eng. 2023 Feb;12464. doi: 10.1117/12.2653345. Epub 2023 Apr 3.
3
Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator.通过自残差注意力引导的异构翻译器实现标记磁共振成像序列到音频合成
Med Image Comput Comput Assist Interv. 2022 Sep;13436:376-386. doi: 10.1007/978-3-031-16446-0_36. Epub 2022 Sep 17.
4
CMRI2SPEC: CINE MRI SEQUENCE TO SPECTROGRAM SYNTHESIS VIA A PAIRWISE HETEROGENEOUS TRANSLATOR.CMRI2SPEC:通过成对异构翻译器将电影磁共振成像序列合成到频谱图
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:1481-1485. doi: 10.1109/icassp43922.2022.9746381. Epub 2022 Apr 27.
5
A deep joint sparse non-negative matrix factorization framework for identifying the common and subject-specific functional units of tongue motion during speech.一种深度联合稀疏非负矩阵分解框架,用于识别言语中舌运动的共同和特定于主题的功能单元。
Med Image Anal. 2021 Aug;72:102131. doi: 10.1016/j.media.2021.102131. Epub 2021 Jun 12.
6
Phase Vector Incompressible Registration Algorithm for Motion Estimation From Tagged Magnetic Resonance Images.用于从标记磁共振图像进行运动估计的相向量不可压缩配准算法
IEEE Trans Med Imaging. 2017 Oct;36(10):2116-2128. doi: 10.1109/TMI.2017.2723021. Epub 2017 Jul 4.
7
3D tongue motion from tagged and cine MR images.来自标记和电影磁共振图像的三维舌运动
Med Image Comput Comput Assist Interv. 2013;16(Pt 3):41-8. doi: 10.1007/978-3-642-40760-4_6.
8
SEMI-AUTOMATIC SEGMENTATION OF THE TONGUE FOR 3D MOTION ANALYSIS WITH DYNAMIC MRI.利用动态磁共振成像进行三维运动分析的舌部半自动分割
Proc IEEE Int Symp Biomed Imaging. 2013 Dec 31;2013:1465-1468. doi: 10.1109/ISBI.2013.6556811.

通过标记MRI的4D运动场跨模态翻译进行语音运动异常检测。

Speech Motion Anomaly Detection via Cross-Modal Translation of 4D Motion Fields from Tagged MRI.

作者信息

Liu Xiaofeng, Xing Fangxu, Zhuo Jiachen, Stone Maureen, Prince Jerry L, El Fakhri Georges, Woo Jonghye

机构信息

Gordon Center for Medical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114 USA.

Dept. of Radiology, University of Maryland School of Medicine, Baltimore, MD 21201 USA.

出版信息

Proc SPIE Int Soc Opt Eng. 2024 Feb;12926. doi: 10.1117/12.3006874. Epub 2024 May 1.

DOI:10.1117/12.3006874
PMID:39238547
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11377028/
Abstract

Understanding the relationship between tongue motion patterns during speech and their resulting speech acoustic outcomes-i.e., articulatory-acoustic relation-is of great importance in assessing speech quality and developing innovative treatment and rehabilitative strategies. This is especially important when evaluating and detecting abnormal articulatory features in patients with speech-related disorders. In this work, we aim to develop a framework for detecting speech motion anomalies in conjunction with their corresponding speech acoustics. This is achieved through the use of a deep cross-modal translator trained on data from healthy individuals only, which bridges the gap between 4D motion fields obtained from tagged MRI and 2D spectrograms derived from speech acoustic data. The trained translator is used as an anomaly detector, by measuring the spectrogram reconstruction quality on healthy individuals or patients. In particular, the cross-modal translator is likely to yield limited generalization capabilities on patient data, which includes unseen out-of-distribution patterns and demonstrates subpar performance, when compared with healthy individuals. A one-class SVM is then used to distinguish the spectrograms of healthy individuals from those of patients. To validate our framework, we collected a total of 39 paired tagged MRI and speech waveforms, consisting of data from 36 healthy individuals and 3 tongue cancer patients. We used both 3D convolutional and transformer-based deep translation models, training them on the healthy training set and then applying them to both the healthy and patient testing sets. Our framework demonstrates a capability to detect abnormal patient data, thereby illustrating its potential in enhancing the understanding of the articulatory-acoustic relation for both healthy individuals and patients.

摘要

理解言语过程中舌部运动模式与其产生的语音声学结果之间的关系,即发音-声学关系,对于评估语音质量以及制定创新的治疗和康复策略至关重要。在评估和检测与言语相关障碍患者的异常发音特征时,这一点尤为重要。在这项工作中,我们旨在开发一个框架,用于结合相应的语音声学来检测言语运动异常。这是通过使用仅在健康个体数据上训练的深度跨模态翻译器来实现的,该翻译器弥合了从标记MRI获得的4D运动场与从语音声学数据导出的2D频谱图之间的差距。通过测量健康个体或患者的频谱图重建质量,将训练好的翻译器用作异常检测器。特别是,与健康个体相比,跨模态翻译器在患者数据上可能具有有限的泛化能力,患者数据包括未见的分布外模式并且表现不佳。然后使用一类支持向量机将健康个体的频谱图与患者的频谱图区分开来。为了验证我们的框架,我们总共收集了39对标记MRI和语音波形,包括来自36名健康个体和3名舌癌患者的数据。我们使用了基于3D卷积和基于Transformer的深度翻译模型,在健康训练集上对它们进行训练,然后将它们应用于健康和患者测试集。我们的框架展示了检测异常患者数据的能力,从而说明了其在增强对健康个体和患者的发音-声学关系理解方面的潜力。