Suppr超能文献

使用密集连接的 3D 卷积神经网络进行脑电信号合成。

Speech synthesis from ECoG using densely connected 3D convolutional neural networks.

机构信息

Cognitive Systems Lab, University of Bremen, Bremen, Germany.

出版信息

J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.

Abstract

OBJECTIVE

Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech.

APPROACH

Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant.

MAIN RESULTS

In a study with six participants, we achieved correlations up to r  =  0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output.

SIGNIFICANCE

To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.

摘要

目的

通过神经信号直接合成语音,可以为患有神经疾病的人提供一种快速自然的交流方式。侵入性测量的大脑活动(脑电图;ECoG)提供了必要的时间和空间分辨率,以解码快速和复杂的过程,如语音产生。近年来,使用神经信号进行语音解码方面取得了许多令人印象深刻的进展,但复杂的动态仍然没有被完全理解。然而,简单的线性模型不太可能捕捉到神经活动与连续语音之间的关系。

方法

在这里,我们展示了深度神经网络可以用于将语音产生区域的 ECoG 映射到语音的中间表示(对数梅尔频谱图)上。所提出的方法使用密集连接的卷积神经网络拓扑结构,非常适合使用每个参与者可用的少量数据进行工作。

主要结果

在一项有六名参与者的研究中,我们实现了重建和原始对数梅尔频谱图之间高达 r=0.69 的相关性。我们通过应用 Wavenet 声码器将我们的预测转换回可听见的波形。声码器的条件是对数梅尔特征,利用了更大的、预先存在的数据语料库,以提供最自然的声学输出。

意义

据我们所知,这是第一次使用深度神经网络从语音产生期间的神经记录中重建高质量的语音。

相似文献

1
Speech synthesis from ECoG using densely connected 3D convolutional neural networks.
J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.
2
Speech Synthesis from Stereotactic EEG using an Electrode Shaft Dependent Multi-Input Convolutional Neural Network Approach.
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6045-6048. doi: 10.1109/EMBC46164.2021.9629711.
4
Iterative alignment discovery of speech-associated neural activity.
J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c.
5
Spatial resolution dependence on spectral frequency in human speech cortex electrocorticography.
J Neural Eng. 2016 Oct;13(5):056013. doi: 10.1088/1741-2560/13/5/056013. Epub 2016 Aug 31.
6
Decoding micro-electrocorticographic signals by using explainable 3D convolutional neural network to predict finger movements.
J Neurosci Methods. 2024 Nov;411:110251. doi: 10.1016/j.jneumeth.2024.110251. Epub 2024 Aug 14.
7
High-resolution neural recordings improve the accuracy of speech decoding.
Nat Commun. 2023 Nov 6;14(1):6938. doi: 10.1038/s41467-023-42555-1.
8
Decoding of finger trajectory from ECoG using deep learning.
J Neural Eng. 2018 Jun;15(3):036009. doi: 10.1088/1741-2552/aa9dbe. Epub 2017 Nov 28.
9
Dynamic network modeling and dimensionality reduction for human ECoG activity.
J Neural Eng. 2019 Aug 14;16(5):056014. doi: 10.1088/1741-2552/ab2214.
10
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.
J Neurosci. 2017 Aug 16;37(33):7906-7920. doi: 10.1523/JNEUROSCI.0238-17.2017. Epub 2017 Jul 17.

引用本文的文献

1
2
An instantaneous voice-synthesis neuroprosthesis.
Nature. 2025 Jun 12. doi: 10.1038/s41586-025-09127-3.
3
Synthesizing intelligible utterances from EEG of imagined speech.
Front Neurosci. 2025 Apr 17;19:1565848. doi: 10.3389/fnins.2025.1565848. eCollection 2025.
4
5
A streaming brain-to-voice neuroprosthesis to restore naturalistic communication.
Nat Neurosci. 2025 Apr;28(4):902-912. doi: 10.1038/s41593-025-01905-6. Epub 2025 Mar 31.
6
Transformer-based neural speech decoding from surface and depth electrode signals.
J Neural Eng. 2025 Jan 28;22(1):016017. doi: 10.1088/1741-2552/adab21.
7
Decoding speech intent from non-frontal cortical areas.
J Neural Eng. 2025 Feb 13;22(1):016024. doi: 10.1088/1741-2552/adaa20.
9
Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.
medRxiv. 2024 Sep 22:2024.09.18.24313755. doi: 10.1101/2024.09.18.24313755.
10
An instantaneous voice synthesis neuroprosthesis.
bioRxiv. 2024 Sep 20:2024.08.14.607690. doi: 10.1101/2024.08.14.607690.

本文引用的文献

1
Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex.
PLoS Comput Biol. 2019 Sep 16;15(9):e1007091. doi: 10.1371/journal.pcbi.1007091. eCollection 2019 Sep.
2
The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography.
Neurotherapeutics. 2019 Jan;16(1):144-165. doi: 10.1007/s13311-018-00692-2.
3
Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri.
J Neurosci. 2018 Nov 14;38(46):9803-9813. doi: 10.1523/JNEUROSCI.1206-18.2018. Epub 2018 Sep 26.
4
Inferring single-trial neural population dynamics using sequential auto-encoders.
Nat Methods. 2018 Oct;15(10):805-815. doi: 10.1038/s41592-018-0109-9. Epub 2018 Sep 17.
5
The Control of Vocal Pitch in Human Laryngeal Motor Cortex.
Cell. 2018 Jun 28;174(1):21-31.e9. doi: 10.1016/j.cell.2018.05.016.
6
Motor cortical activity changes during neuroprosthetic-controlled object interaction.
Sci Rep. 2017 Dec 5;7(1):16947. doi: 10.1038/s41598-017-17222-3.
7
Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids.
Neuroimage. 2018 Oct 15;180(Pt A):301-311. doi: 10.1016/j.neuroimage.2017.10.011. Epub 2017 Oct 7.
8
Intonational speech prosody encoding in the human auditory cortex.
Science. 2017 Aug 25;357(6353):797-801. doi: 10.1126/science.aam8577.
9
Deep learning with convolutional neural networks for EEG decoding and visualization.
Hum Brain Mapp. 2017 Nov;38(11):5391-5420. doi: 10.1002/hbm.23730. Epub 2017 Aug 7.
10
Key considerations in designing a speech brain-computer interface.
J Physiol Paris. 2016 Nov;110(4 Pt A):392-401. doi: 10.1016/j.jphysparis.2017.07.002. Epub 2017 Aug 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验