多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.

机构信息

Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.

Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.

出版信息

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

DOI:10.1038/s41597-021-00976-x

PMID:34285240

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8292336/

Abstract

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.

摘要

实时磁共振成像（RT-MRI）在人类言语产生中的应用正在推动言语科学、语言学、仿生言语技术发展和临床应用的重大进展。然而，RT-MRI 易于访问，并且需要具有广泛访问权限的综合数据集，以促进众多领域的研究。快速运动的发音器官和言语期间动态气道成形的成像需要高时空分辨率和强大的重建方法。此外，虽然已经发布了重建图像，但迄今为止，没有提供来自优化言语产生实验设置的原始多通道 RT-MRI 数据的开放数据集。这样的数据集可以为动态图像重建、伪影校正、特征提取以及语言相关生物标志物的直接提取提供新的和改进的方法。本数据集提供了一个独特的语料库，其中包含 75 名参与者执行语言驱动的言语任务时的二维矢状面 RT-MRI 视频以及同步音频，以及相应的公共领域原始 RT-MRI 数据。该数据集还包括在持续言语声音期间的三维容积性声门 MRI 和每位参与者的高分辨率静态解剖 T2 加权上气道 MRI。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f1/8292336/0b7ea7ef546d/41597_2021_976_Fig1_HTML.jpg

相似文献

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

3D dynamic MRI of the vocal tract during natural speech.自然言语状态下声道的 3D 动态 MRI

Magn Reson Med. 2019 Mar;81(3):1511-1520. doi: 10.1002/mrm.27570. Epub 2018 Nov 3.

Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging.基于静态和实时动态磁共振成像的人类语音生物标志物的重测重复性

J Acoust Soc Am. 2017 May;141(5):3323. doi: 10.1121/1.4983081.

Real-time MRI of speaking at a resolution of 33 ms: undersampled radial FLASH with nonlinear inverse reconstruction.33 毫秒分辨率的实时说话 MRI：带非线性逆重建的欠采样径向 FLASH。

Magn Reson Med. 2013 Feb;69(2):477-85. doi: 10.1002/mrm.24276. Epub 2012 Apr 12.

One-second MRI of a three-dimensional vocal tract to measure dynamic articulator modifications.用于测量动态发音器官变化的三维声道一秒磁共振成像。

J Magn Reson Imaging. 2017 Jul;46(1):94-101. doi: 10.1002/jmri.25561. Epub 2016 Dec 9.

Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech.时间分辨螺旋广义自动校准部分并行采集在低延迟实时磁共振语音成像中的可行性。

Magn Reson Med. 2017 Dec;78(6):2275-2282. doi: 10.1002/mrm.26611. Epub 2017 Feb 10.

Speech production real-time MRI at 0.55 T.0.55特斯拉下的言语产生实时磁共振成像

Magn Reson Med. 2024 Jan;91(1):337-343. doi: 10.1002/mrm.29843. Epub 2023 Oct 5.

A fast and flexible MRI system for the study of dynamic vocal tract shaping.一种用于研究动态声道塑形的快速灵活的磁共振成像（MRI）系统。

Magn Reson Med. 2017 Jan;77(1):112-125. doi: 10.1002/mrm.26090. Epub 2016 Jan 17.

Dynamic 3-D visualization of vocal tract shaping during speech.言语过程中声道构形的动态三维可视化。

IEEE Trans Med Imaging. 2013 May;32(5):838-48. doi: 10.1109/TMI.2012.2230017. Epub 2012 Nov 27.

Real-time speech MRI datasets with corresponding articulator ground-truth segmentations.带有相应发音器官真实分割的实时语音 MRI 数据集。

Sci Data. 2023 Dec 2;10(1):860. doi: 10.1038/s41597-023-02766-z.

引用本文的文献

Lightweight error-tolerant edge detection using memristor-enabled stochastic computing.使用忆阻器实现的随机计算进行轻量级容错边缘检测。

Nat Commun. 2025 May 16;16(1):4550. doi: 10.1038/s41467-025-59872-2.

An Audio-Ultrasound Synchronized Database of Tongue Movement for Mandarin speech.一个用于普通话语音的舌动音频-超声同步数据库。

Sci Data. 2025 Apr 11;12(1):607. doi: 10.1038/s41597-025-04917-w.

Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks.用于基于深度学习从3D MRI进行自动分割的开源手动标注声道数据库：2D和3D卷积网络与Transformer网络的基准测试

J Voice. 2025 Mar 5. doi: 10.1016/j.jvoice.2025.02.026.

NEBULA101: an open dataset for the study of language aptitude in behaviour, brain structure and function.星云101：一个用于研究行为、大脑结构和功能方面语言能力的开放数据集。

Sci Data. 2025 Jan 6;12(1):19. doi: 10.1038/s41597-024-04357-y.

Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data.基于MRI数据的对比学习方法用于评估舌癌患者的语音清晰度

Interspeech. 2024 Sep;2024:927-931. doi: 10.21437/interspeech.2024-2236.

Sharing Data Is Essential for the Future of AI in Medical Imaging.数据共享对于医学影像人工智能的未来至关重要。

Radiol Artif Intell. 2024 Jan;6(1):e230337. doi: 10.1148/ryai.230337.

Real-time speech MRI datasets with corresponding articulator ground-truth segmentations.带有相应发音器官真实分割的实时语音 MRI 数据集。

Sci Data. 2023 Dec 2;10(1):860. doi: 10.1038/s41597-023-02766-z.

Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech.自然语音过程中声道的超分辨率动态三维重建

J Imaging. 2023 Oct 20;9(10):233. doi: 10.3390/jimaging9100233.

Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.使用协议自适应堆叠迁移学习U-NET模型的动态语音磁共振成像中的自动多发音器分割

Bioengineering (Basel). 2023 May 22;10(5):623. doi: 10.3390/bioengineering10050623.

M4Raw: A multi-contrast, multi-repetition, multi-channel MRI k-space dataset for low-field MRI research.M4Raw：一个用于低场 MRI 研究的多对比度、多重复、多通道 MRI 空(k)数据集。

Sci Data. 2023 May 10;10(1):264. doi: 10.1038/s41597-023-02181-4.

本文引用的文献

Results of the 2020 fastMRI Challenge for Machine Learning MR Image Reconstruction.2020 年快速 MRI 挑战赛机器学习磁共振图像重建结果。

IEEE Trans Med Imaging. 2021 Sep;40(9):2306-2317. doi: 10.1109/TMI.2021.3075856. Epub 2021 Aug 31.

Aliasing artifact reduction in spiral real-time MRI.螺旋实时磁共振成像中混叠伪影的减少

Magn Reson Med. 2021 Aug;86(2):916-925. doi: 10.1002/mrm.28746. Epub 2021 Mar 16.

Real-Time Magnetic Resonance Imaging.实时磁共振成像。

J Magn Reson Imaging. 2022 Jan;55(1):81-99. doi: 10.1002/jmri.27411. Epub 2020 Dec 9.

Deblurring for spiral real-time MRI using convolutional neural networks.使用卷积神经网络进行螺旋实时磁共振成像的去模糊处理。

Magn Reson Med. 2020 Dec;84(6):3438-3452. doi: 10.1002/mrm.28393. Epub 2020 Jul 25.

How an aglossic speaker produces an alveolar-like percept without a functional tongue tip.一位无舌者在没有功能性舌尖的情况下如何产生类似齿龈音的感知。

J Acoust Soc Am. 2020 Jun;147(6):EL460. doi: 10.1121/10.0001329.

Variability in individual constriction contributions to third formant values in American English /ɹ/.个体收缩对美国英语/r/第三共振峰值的贡献存在变异性。

J Acoust Soc Am. 2020 Jun;147(6):3905. doi: 10.1121/10.0001413.

Vocal tract shaping of emotional speech.情感言语的声道塑造

Comput Speech Lang. 2020 Nov;64. doi: 10.1016/j.csl.2020.101100. Epub 2020 Apr 16.

Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge.利用开放竞赛推进磁共振图像重建中的机器学习：2019 年 fastMRI 挑战赛概述。

Magn Reson Med. 2020 Dec;84(6):3054-3070. doi: 10.1002/mrm.28338. Epub 2020 Jun 7.

fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Machine Learning.快速磁共振成像（fastMRI）：一个公开可用的膝关节图像原始k空间和DICOM数据集，用于使用机器学习加速磁共振图像重建。

Radiol Artif Intell. 2020 Jan 29;2(1):e190007. doi: 10.1148/ryai.2020190007.

A modular architecture for articulatory synthesis from gestural specification.基于运动学规范的发音合成的模块化架构。

J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献