Chen Junbo, Chen Xupeng, Wang Ran, Le Chenqian, Khalilian-Gourtani Amirhossein, Jensen Erika, Dugan Patricia, Doyle Werner, Devinsky Orrin, Friedman Daniel, Flinker Adeen, Wang Yao
Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA.
Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA.
bioRxiv. 2024 Sep 25:2024.03.11.584533. doi: 10.1101/2024.03.11.584533.
This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training.
We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-patient models exploiting data from multiple participants.
The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation.
The proposed SwinTW decoder enables future speech neuroprostheses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests that such a model can be applied to new patients that do not have paired acoustic and neural data, providing an advance in neuroprostheses for people with speech disability, where acoustic-neural training data is not feasible.
本研究调查从颅内电极捕获的神经信号中进行语音解码。大多数先前的工作仅适用于二维网格上的电极(即皮层脑电图或ECoG阵列)以及来自单个患者的数据。我们旨在设计一种深度学习模型架构,该架构能够适应表面(ECoG)和深度(立体定向脑电图或sEEG)电极。该架构应允许对来自多个参与者且电极放置差异很大的数据进行训练,并且训练后的模型应对训练期间未见过的参与者表现良好。
我们提出了一种名为SwinTW的基于新型变压器的模型架构,该架构可以通过利用电极在皮层上的三维位置而非二维网格上的位置来处理任意位置的电极。我们使用来自单个参与者的数据训练特定于个体的模型,并使用来自多个参与者的数据训练多患者模型。
仅使用低密度8x8 ECoG数据的特定于个体的模型在N = 43名参与者中与地面真实频谱图实现了高解码皮尔逊相关系数(PCC = 0.817),优于我们之前的卷积ResNet模型和三维Swin变压器模型。纳入每个参与者(N = 39)可用的额外条形、深度和网格电极导致了进一步的改进(PCC = 0.838)。对于仅使用sEEG电极的参与者(N = 9),特定于个体的模型仍然具有可比的性能,平均PCC = 0.798。多主体模型在未见过的参与者上实现了高性能,在留一法交叉验证中平均PCC = 0.765。
所提出的SwinTW解码器使未来的语音神经假体能够利用对特定参与者临床上最佳或可行的任何电极放置,包括仅使用深度电极,深度电极在慢性神经外科手术中更常被植入。重要的是,多患者模型的通用性表明这样的模型可以应用于没有配对声学和神经数据的新患者,为语音残疾患者的神经假体提供了进展,在这种情况下声学 - 神经训练数据是不可行的。