Suppr超能文献

一种利用深度学习和语音合成的神经语音解码框架。

A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis.

作者信息

Chen Xupeng, Wang Ran, Khalilian-Gourtani Amirhossein, Yu Leyao, Dugan Patricia, Friedman Daniel, Doyle Werner, Devinsky Orrin, Wang Yao, Flinker Adeen

出版信息

bioRxiv. 2023 Sep 17:2023.09.16.558028. doi: 10.1101/2023.09.16.558028.

Abstract

Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.

摘要

从神经信号中解码人类语音对于脑机接口(BCI)技术恢复神经功能缺损人群的语音功能至关重要。然而,这仍然是一项极具挑战性的任务,神经信号与相应语音的稀缺性、数据复杂性和高维度,以及有限的公开可用源代码,都加剧了这一挑战。在此,我们提出了一种基于深度学习的新型神经语音解码框架,该框架包括一个将来自皮层的皮层脑电图(ECoG)信号转换为可解释语音参数的ECoG解码器,以及一个将语音参数映射到频谱图的新型可微语音合成器。我们开发了一个由语音编码器和相同语音合成器组成的配套音频到音频自动编码器,以生成参考语音参数,便于ECoG解码器训练。该框架生成的语音自然,并且在48名参与者的队列中具有高度可重复性。在用于ECoG解码器的三种神经网络架构中,3D ResNet模型在预测原始语音频谱图方面具有最佳解码性能(PCC = 0.804),紧随其后的是SWIN模型(PCC = 0.796)。我们的实验结果表明,即使仅限于因果操作,我们的模型也能以高相关性解码语音,这是实时神经假体采用所必需的。我们成功地对左半球或右半球覆盖的参与者进行了语音解码,这可能会为因左半球损伤导致语音缺陷的患者带来语音假体。此外,我们使用遮挡分析来识别我们模型中有助于语音解码的皮层区域。最后,我们提供了两阶段训练管道的开源代码以及相关的预处理和可视化工具,以实现可重复研究,并推动语音科学和假体社区的研究。

相似文献

1
A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis.
bioRxiv. 2023 Sep 17:2023.09.16.558028. doi: 10.1101/2023.09.16.558028.
2
Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals.
bioRxiv. 2024 Sep 25:2024.03.11.584533. doi: 10.1101/2024.03.11.584533.
3
Decoding and interpreting cortical signals with a compact convolutional neural network.
J Neural Eng. 2021 Mar 2;18(2). doi: 10.1088/1741-2552/abe20e.
5
Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication.
Neurotherapeutics. 2022 Jan;19(1):263-273. doi: 10.1007/s13311-022-01190-2. Epub 2022 Jan 31.
7
High-resolution neural recordings improve the accuracy of speech decoding.
Nat Commun. 2023 Nov 6;14(1):6938. doi: 10.1038/s41467-023-42555-1.
8
Speech synthesis from ECoG using densely connected 3D convolutional neural networks.
J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.
9
Decoding speech using the timing of neural signal modulation.
Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:1532-1535. doi: 10.1109/EMBC.2016.7591002.
10
Overt speech decoding from cortical activity: a comparison of different linear methods.
Front Hum Neurosci. 2023 Jun 23;17:1124065. doi: 10.3389/fnhum.2023.1124065. eCollection 2023.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验