从大脑活动中解码和合成声调语言的语音。

Decoding and synthesizing tonal language speech from brain activity.

机构信息

Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai 200040, China.

National Center for Neurological Disorders, Shanghai 200052, China.

出版信息

Sci Adv. 2023 Jun 9;9(23):eadh0478. doi: 10.1126/sciadv.adh0478.

DOI:10.1126/sciadv.adh0478

PMID:37294753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10256166/

Abstract

Recent studies have shown that the feasibility of speech brain-computer interfaces (BCIs) as a clinically valid treatment in helping nontonal language patients with communication disorders restore their speech ability. However, tonal language speech BCI is challenging because additional precise control of laryngeal movements to produce lexical tones is required. Thus, the model should emphasize the features from the tonal-related cortex. Here, we designed a modularized multistream neural network that directly synthesizes tonal language speech from intracranial recordings. The network decoded lexical tones and base syllables independently via parallel streams of neural network modules inspired by neuroscience findings. The speech was synthesized by combining tonal syllable labels with nondiscriminant speech neural activity. Compared to commonly used baseline models, our proposed models achieved higher performance with modest training data and computational costs. These findings raise a potential strategy for approaching tonal language speech restoration.

摘要

最近的研究表明，语音脑机接口（BCI）作为一种帮助非声调语言患者恢复言语能力的临床有效治疗方法具有可行性。然而，声调语言语音 BCI 具有挑战性，因为需要对喉部运动进行额外的精确控制，以产生词汇声调。因此，该模型应强调与声调相关的皮质特征。在这里，我们设计了一个模块化的多流神经网络，可以直接从颅内记录中合成声调语言语音。该网络通过受神经科学发现启发的神经网络模块的并行流，分别解码词汇声调和谐音基音节。通过将声调音节标签与无判别力的言语神经活动相结合来合成语音。与常用的基线模型相比，我们提出的模型在使用适度的训练数据和计算成本的情况下取得了更高的性能。这些发现为声调语言语音恢复提供了一种潜在的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb5f/10256166/40033a578aed/sciadv.adh0478-f1.jpg

相似文献

Decoding and synthesizing tonal language speech from brain activity.从大脑活动中解码和合成声调语言的语音。

Sci Adv. 2023 Jun 9;9(23):eadh0478. doi: 10.1126/sciadv.adh0478.

Decoding lexical tones and vowels in imagined tonal monosyllables using fNIRS signals.使用功能近红外光谱（fNIRS）信号解码想象中的单音节声调中的声调与元音。

J Neural Eng. 2022 Nov 10;19(6). doi: 10.1088/1741-2552/ac9e1d.

Towards Naturalistic Speech Decoding from Intracranial Brain Data.从颅内脑数据中实现自然语言解码。

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3100-3104. doi: 10.1109/EMBC48229.2022.9871301.

Effects of native language experience on Mandarin lexical tone processing in proficient second language learners.母语经验对熟练第二语言学习者普通话词汇声调处理的影响。

Psychophysiology. 2019 Nov;56(11):e13448. doi: 10.1111/psyp.13448. Epub 2019 Jul 29.

The Effect of Speech Variability on Tonal Language Speakers' Second Language Lexical Tone Learning.语音变异性对声调语言使用者第二语言词汇声调学习的影响。

Front Psychol. 2018 Oct 23;9:1982. doi: 10.3389/fpsyg.2018.01982. eCollection 2018.

EEG-based Classification of Imaginary Mandarin Tones.基于脑电图的汉语声调想象分类

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:3889-3892. doi: 10.1109/EMBC44109.2020.9176608.

Human cortical encoding of pitch in tonal and non-tonal languages.人类对有调及非有调语言中音高的皮质编码。

Nat Commun. 2021 Feb 19;12(1):1161. doi: 10.1038/s41467-021-21430-x.

Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network.从分布式语言网络中解码自然连续语音的发音和语音成分。

J Neural Eng. 2023 Aug 14;20(4). doi: 10.1088/1741-2552/ace9fb.

Brain2Char: a deep architecture for decoding text from brain recordings.脑到字符：一种从脑记录中解码文本的深度架构。

J Neural Eng. 2020 Dec 16;17(6). doi: 10.1088/1741-2552/abc742.

Music-to-language transfer effect: may melodic ability improve learning of tonal languages by native nontonal speakers?音乐到语言的迁移效应：母语为非声调语言的人，其旋律能力能否提高声调语言的学习效果？

Cogn Process. 2006 Sep;7(3):203-7. doi: 10.1007/s10339-006-0146-7. Epub 2006 Aug 8.

引用本文的文献

An instantaneous voice-synthesis neuroprosthesis.一种即时语音合成神经假体。

Nature. 2025 Jun 12. doi: 10.1038/s41586-025-09127-3.

Acoustic Inspired Brain-to-Sentence Decoder for Logosyllabic Language.用于标识音节语言的声学启发式脑到句子解码器

Cyborg Bionic Syst. 2025 Apr 29;6:0257. doi: 10.34133/cbsystems.0257. eCollection 2025.

VocalMind: A Stereotactic EEG Dataset for Vocalized, Mimed, and Imagined Speech in Tonal Language.VocalMind：一个用于有声、哑剧和想象中的声调语言语音的立体定向脑电图数据集。

Sci Data. 2025 Apr 19;12(1):657. doi: 10.1038/s41597-025-04741-2.

Recent applications of EEG-based brain-computer-interface in the medical field.基于脑电图的脑机接口在医学领域的最新应用。

Mil Med Res. 2025 Mar 24;12(1):14. doi: 10.1186/s40779-025-00598-z.

A Bibliometric Analysis of the Application of Brain-Computer Interface in Rehabilitation Medicine Over the Past 20 Years.过去20年脑机接口在康复医学中应用的文献计量分析

J Multidiscip Healthc. 2025 Mar 4;18:1297-1317. doi: 10.2147/JMDH.S509747. eCollection 2025.

[Applications and prospects of electroencephalography technology in neurorehabilitation assessment and treatment].脑电图技术在神经康复评估与治疗中的应用及前景

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Dec 25;41(6):1271-1278. doi: 10.7507/1001-5515.202404046.

Brain-computer Interaction in the Smart Era.智能时代的脑机交互。

Curr Med Sci. 2024 Dec;44(6):1123-1131. doi: 10.1007/s11596-024-2927-6. Epub 2024 Sep 30.

An instantaneous voice synthesis neuroprosthesis.一种即时语音合成神经假体。

bioRxiv. 2024 Sep 20:2024.08.14.607690. doi: 10.1101/2024.08.14.607690.

Large-scale foundation models and generative AI for BigData neuroscience.用于大数据神经科学的大规模基础模型和生成式人工智能。

Neurosci Res. 2024 Jun 17. doi: 10.1016/j.neures.2024.06.003.

Structural and temporal dynamics analysis of neural circuit from 2002 to 2022: A bibliometric analysis.2002年至2022年神经回路的结构与时间动态分析：一项文献计量分析

Heliyon. 2024 Jan 14;10(2):e24649. doi: 10.1016/j.heliyon.2024.e24649. eCollection 2024 Jan 30.

本文引用的文献

FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS.FRAUG：一种基于帧率的数据增强方法，用于从语音信号中检测抑郁症。

Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:6267-6271. doi: 10.1109/icassp43922.2022.9746307. Epub 2022 Apr 27.

Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria.神经假体用于解码无言语症瘫痪患者的言语。

N Engl J Med. 2021 Jul 15;385(3):217-227. doi: 10.1056/NEJMoa2027540.

Human cortical encoding of pitch in tonal and non-tonal languages.人类对有调及非有调语言中音高的皮质编码。

Nat Commun. 2021 Feb 19;12(1):1161. doi: 10.1038/s41467-021-21430-x.

Machine translation of cortical activity to text with an encoder-decoder framework.基于编解码器框架的皮质活动文本机器翻译。

Nat Neurosci. 2020 Apr;23(4):575-582. doi: 10.1038/s41593-020-0608-8. Epub 2020 Mar 30.

Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex.深度学习作为神经数据分析的工具：人类感觉运动皮层中的语音分类和跨频耦合。

PLoS Comput Biol. 2019 Sep 16;15(9):e1007091. doi: 10.1371/journal.pcbi.1007091. eCollection 2019 Sep.

Real-time decoding of question-and-answer speech dialogue using human cortical activity.使用人类大脑皮层活动实时解码问答式语音对话。

Nat Commun. 2019 Jul 30;10(1):3096. doi: 10.1038/s41467-019-10994-4.

Speech synthesis from neural decoding of spoken sentences.基于语音解码的语音合成

Nature. 2019 Apr;568(7753):493-498. doi: 10.1038/s41586-019-1119-1. Epub 2019 Apr 24.

Speech synthesis from ECoG using densely connected 3D convolutional neural networks.使用密集连接的 3D 卷积神经网络进行脑电信号合成。

J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.

New and emerging access technologies for adults with complex communication needs and severe motor impairments: State of the science.成人复杂沟通需求和严重运动障碍者的新出现和新兴接入技术：科学现状。

Augment Altern Commun. 2019 Mar;35(1):13-25. doi: 10.1080/07434618.2018.1556730. Epub 2019 Jan 21.

The Control of Vocal Pitch in Human Laryngeal Motor Cortex.人类喉 Motor 皮质中的声控音高。

Cell. 2018 Jun 28;174(1):21-31.e9. doi: 10.1016/j.cell.2018.05.016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从大脑活动中解码和合成声调语言的语音。

Decoding and synthesizing tonal language speech from brain activity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献