Suppr超能文献

基于Transformer的从表面和深度电极信号进行神经语音解码

Transformer-based neural speech decoding from surface and depth electrode signals.

作者信息

Chen Junbo, Chen Xupeng, Wang Ran, Le Chenqian, Khalilian-Gourtani Amirhossein, Jensen Erika, Dugan Patricia, Doyle Werner, Devinsky Orrin, Friedman Daniel, Flinker Adeen, Wang Yao

机构信息

Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America.

Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America.

出版信息

J Neural Eng. 2025 Jan 28;22(1):016017. doi: 10.1088/1741-2552/adab21.

Abstract

This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e. Electrocorticographic (ECoG) or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers and the trained model should perform well on participants unseen during training.We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants.The subject-specific models using only low-density 8 × 8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC = 0.817), over= 43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (= 39) led to further improvement (PCC = 0.838). For participants with only sEEG electrodes (= 9), subject-specific models still enjoy comparable performance with an average PCC = 0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC = 0.837) as 15 models trained individually for these participants (PCC = 0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC = 0.765 in leave-one-out cross-validation.The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture's flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.

摘要

本研究调查了通过颅内电极捕获的神经信号进行语音解码的情况。大多数先前的工作仅适用于二维网格上的电极(即皮层脑电图(ECoG)或ECoG阵列)以及来自单个患者的数据。我们旨在设计一种深度学习模型架构,该架构能够同时适应表面ECoG电极和深度(立体定向脑电图或sEEG)电极。该架构应允许对来自多个参与者且电极放置差异很大的数据进行训练。该模型不应具有特定于个体的层,并且训练后的模型应在训练期间未见过的参与者上表现良好。我们提出了一种名为SwinTW的基于新型变压器的模型架构,该架构可以通过利用电极在皮层上的三维位置而非二维网格上的位置来处理任意位置的电极。我们使用来自单个参与者的数据训练特定于个体的模型,并使用来自多个参与者的数据训练多个体模型。仅使用低密度8×8 ECoG数据的特定于个体的模型与地面真值频谱图的解码皮尔逊相关系数很高(PCC = 0.817),超过43名参与者,显著优于我们之前的卷积ResNet模型和三维Swin变压器模型。纳入每个参与者(共39名)可用 的额外条形、深度和网格电极导致了进一步的改进(PCC = 0.838)。对于仅使用sEEG电极的参与者(共9名),特定于个体的模型仍然具有可比的性能,平均PCC = 0.798。在来自15名参与者的ECoG数据上训练的单个多个体模型产生了与为这些参与者单独训练的15个模型相当的结果(PCC = 0.837)(PCC = 0.831)。此外,多个体模型在未见过的参与者上取得了高性能,在留一法交叉验证中的平均PCC = 0.765。所提出的SwinTW解码器使未来的语音解码方法能够利用对特定参与者临床上最佳或可行的任何电极放置,包括仅使用深度电极,深度电极在慢性神经外科手术中更常被植入。在训练队列中的参与者上进行测试时,单个多个体模型的成功表明该模型架构能够利用来自具有不同电极放置的多个参与者的数据。该架构在使用单个个体和多个体数据以及网格和非网格电极进行训练时的灵活性确保了其广泛的适用性。重要的是,我们研究人群中多个体模型的可推广性表明,使用来自多个患者的配对声学和神经数据训练的模型可能潜在地应用于无法获得声学 - 神经训练数据的患有言语障碍的新患者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29ec/11773629/d07c755c995c/jneadab21f1_hr.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验