• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用密集连接的 3D 卷积神经网络进行脑电信号合成。

Speech synthesis from ECoG using densely connected 3D convolutional neural networks.

机构信息

Cognitive Systems Lab, University of Bremen, Bremen, Germany.

出版信息

J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.

DOI:10.1088/1741-2552/ab0c59
PMID:30831567
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6822609/
Abstract

OBJECTIVE

Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech.

APPROACH

Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant.

MAIN RESULTS

In a study with six participants, we achieved correlations up to r  =  0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output.

SIGNIFICANCE

To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.

摘要

目的

通过神经信号直接合成语音,可以为患有神经疾病的人提供一种快速自然的交流方式。侵入性测量的大脑活动(脑电图;ECoG)提供了必要的时间和空间分辨率,以解码快速和复杂的过程,如语音产生。近年来,使用神经信号进行语音解码方面取得了许多令人印象深刻的进展,但复杂的动态仍然没有被完全理解。然而,简单的线性模型不太可能捕捉到神经活动与连续语音之间的关系。

方法

在这里,我们展示了深度神经网络可以用于将语音产生区域的 ECoG 映射到语音的中间表示(对数梅尔频谱图)上。所提出的方法使用密集连接的卷积神经网络拓扑结构,非常适合使用每个参与者可用的少量数据进行工作。

主要结果

在一项有六名参与者的研究中,我们实现了重建和原始对数梅尔频谱图之间高达 r=0.69 的相关性。我们通过应用 Wavenet 声码器将我们的预测转换回可听见的波形。声码器的条件是对数梅尔特征,利用了更大的、预先存在的数据语料库,以提供最自然的声学输出。

意义

据我们所知,这是第一次使用深度神经网络从语音产生期间的神经记录中重建高质量的语音。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/bc2eea20cfaa/nihms-1029540-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/4017dfbccc1f/nihms-1029540-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/4de66e9129b9/nihms-1029540-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/0609b710162c/nihms-1029540-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/18238bbc04d7/nihms-1029540-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/6238e91b3332/nihms-1029540-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/bc2eea20cfaa/nihms-1029540-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/4017dfbccc1f/nihms-1029540-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/4de66e9129b9/nihms-1029540-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/0609b710162c/nihms-1029540-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/18238bbc04d7/nihms-1029540-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/6238e91b3332/nihms-1029540-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/236a/6822609/bc2eea20cfaa/nihms-1029540-f0006.jpg

相似文献

1
Speech synthesis from ECoG using densely connected 3D convolutional neural networks.使用密集连接的 3D 卷积神经网络进行脑电信号合成。
J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.
2
Speech Synthesis from Stereotactic EEG using an Electrode Shaft Dependent Multi-Input Convolutional Neural Network Approach.基于电极轴相关多输入卷积神经网络的立体脑电图语音合成。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6045-6048. doi: 10.1109/EMBC46164.2021.9629711.
3
Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network.使用紧凑且可解释的神经网络从一小组空间隔离的微创颅内脑电图电极进行语音解码。
J Neural Eng. 2022 Nov 24;19(6). doi: 10.1088/1741-2552/aca1e1.
4
Iterative alignment discovery of speech-associated neural activity.语音相关神经活动的迭代对齐发现。
J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c.
5
Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography.人类言语皮层脑电描记术中空间分辨率对频谱频率的依赖性。
J Neural Eng. 2016 Oct;13(5):056013. doi: 10.1088/1741-2560/13/5/056013. Epub 2016 Aug 31.
6
Decoding micro-electrocorticographic signals by using explainable 3D convolutional neural network to predict finger movements.通过使用可解释的三维卷积神经网络对微电皮质电图信号进行解码,以预测手指运动。
J Neurosci Methods. 2024 Nov;411:110251. doi: 10.1016/j.jneumeth.2024.110251. Epub 2024 Aug 14.
7
High-resolution neural recordings improve the accuracy of speech decoding.高分辨率神经记录提高了语音解码的准确性。
Nat Commun. 2023 Nov 6;14(1):6938. doi: 10.1038/s41467-023-42555-1.
8
Decoding of finger trajectory from ECoG using deep learning.使用深度学习对 ECoG 进行手指轨迹解码。
J Neural Eng. 2018 Jun;15(3):036009. doi: 10.1088/1741-2552/aa9dbe. Epub 2017 Nov 28.
9
Dynamic network modeling and dimensionality reduction for human ECoG activity.人类脑电活动的动态网络建模与降维
J Neural Eng. 2019 Aug 14;16(5):056014. doi: 10.1088/1741-2552/ab2214.
10
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.整个外侧裂周皮层对语音低层次特征的神经调谐。
J Neurosci. 2017 Aug 16;37(33):7906-7920. doi: 10.1523/JNEUROSCI.0238-17.2017. Epub 2017 Jul 17.

引用本文的文献

1
A supervised data-driven spatial filter denoising method for speech artifacts in intracranial electrophysiological recordings.一种用于颅内电生理记录中语音伪迹的监督数据驱动空间滤波去噪方法。
Imaging Neurosci (Camb). 2024 Oct 1;2. doi: 10.1162/imag_a_00301. eCollection 2024.
2
An instantaneous voice-synthesis neuroprosthesis.一种即时语音合成神经假体。
Nature. 2025 Jun 12. doi: 10.1038/s41586-025-09127-3.
3
Synthesizing intelligible utterances from EEG of imagined speech.从想象言语的脑电图中合成可理解的话语。

本文引用的文献

1
Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex.深度学习作为神经数据分析的工具:人类感觉运动皮层中的语音分类和跨频耦合。
PLoS Comput Biol. 2019 Sep 16;15(9):e1007091. doi: 10.1371/journal.pcbi.1007091. eCollection 2019 Sep.
2
The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography.利用慢性皮层脑电图实现语音脑-机接口的潜力
Neurotherapeutics. 2019 Jan;16(1):144-165. doi: 10.1007/s13311-018-00692-2.
3
Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri.
Front Neurosci. 2025 Apr 17;19:1565848. doi: 10.3389/fnins.2025.1565848. eCollection 2025.
4
VocalMind: A Stereotactic EEG Dataset for Vocalized, Mimed, and Imagined Speech in Tonal Language.VocalMind:一个用于有声、哑剧和想象中的声调语言语音的立体定向脑电图数据集。
Sci Data. 2025 Apr 19;12(1):657. doi: 10.1038/s41597-025-04741-2.
5
A streaming brain-to-voice neuroprosthesis to restore naturalistic communication.一种用于恢复自然交流的流式脑到语音神经假体。
Nat Neurosci. 2025 Apr;28(4):902-912. doi: 10.1038/s41593-025-01905-6. Epub 2025 Mar 31.
6
Transformer-based neural speech decoding from surface and depth electrode signals.基于Transformer的从表面和深度电极信号进行神经语音解码
J Neural Eng. 2025 Jan 28;22(1):016017. doi: 10.1088/1741-2552/adab21.
7
Decoding speech intent from non-frontal cortical areas.从非额叶皮质区域解码言语意图。
J Neural Eng. 2025 Feb 13;22(1):016024. doi: 10.1088/1741-2552/adaa20.
8
Prosodic Preferences of Surface Electromyography-based Subvocal Speech for People With Laryngectomy.基于表面肌电图的喉切除患者默读语音的韵律偏好
J Voice. 2024 Dec 5. doi: 10.1016/j.jvoice.2024.10.024.
9
Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.从未标记的脑皮层电图信号中实时检测语音:对一名肌萎缩侧索硬化症患者的初步研究。
medRxiv. 2024 Sep 22:2024.09.18.24313755. doi: 10.1101/2024.09.18.24313755.
10
An instantaneous voice synthesis neuroprosthesis.一种即时语音合成神经假体。
bioRxiv. 2024 Sep 20:2024.08.14.607690. doi: 10.1101/2024.08.14.607690.
前中央回和下额前回中发音动作和音位的差异表达。
J Neurosci. 2018 Nov 14;38(46):9803-9813. doi: 10.1523/JNEUROSCI.1206-18.2018. Epub 2018 Sep 26.
4
Inferring single-trial neural population dynamics using sequential auto-encoders.使用序列自编码器推断单试神经群体动力学。
Nat Methods. 2018 Oct;15(10):805-815. doi: 10.1038/s41592-018-0109-9. Epub 2018 Sep 17.
5
The Control of Vocal Pitch in Human Laryngeal Motor Cortex.人类喉 Motor 皮质中的声控音高。
Cell. 2018 Jun 28;174(1):21-31.e9. doi: 10.1016/j.cell.2018.05.016.
6
Motor cortical activity changes during neuroprosthetic-controlled object interaction.神经假体控制的物体交互过程中运动皮层活动的变化
Sci Rep. 2017 Dec 5;7(1):16947. doi: 10.1038/s41598-017-17222-3.
7
Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids.利用高密度 ECoG 网格从感觉运动皮层解码口语音素。
Neuroimage. 2018 Oct 15;180(Pt A):301-311. doi: 10.1016/j.neuroimage.2017.10.011. Epub 2017 Oct 7.
8
Intonational speech prosody encoding in the human auditory cortex.人类听觉皮层中的语调语音韵律编码。
Science. 2017 Aug 25;357(6353):797-801. doi: 10.1126/science.aam8577.
9
Deep learning with convolutional neural networks for EEG decoding and visualization.基于卷积神经网络的 EEG 解码和可视化深度学习。
Hum Brain Mapp. 2017 Nov;38(11):5391-5420. doi: 10.1002/hbm.23730. Epub 2017 Aug 7.
10
Key considerations in designing a speech brain-computer interface.设计语音脑机接口的关键考量因素。
J Physiol Paris. 2016 Nov;110(4 Pt A):392-401. doi: 10.1016/j.jphysparis.2017.07.002. Epub 2017 Aug 7.