语音相关神经活动的迭代对齐发现。

Iterative alignment discovery of speech-associated neural activity.

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, United States of America.

Department of Neurology, Johns Hopkins Medicine, Baltimore, MD 21287, United States of America.

出版信息

J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c.

DOI:10.1088/1741-2552/ad663c

PMID:39194182

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11351572/

Abstract

. Brain-computer interfaces (BCIs) have the potential to preserve or restore speech in patients with neurological disorders that weaken the muscles involved in speech production. However, successful training of low-latency speech synthesis and recognition models requires alignment of neural activity with intended phonetic or acoustic output with high temporal precision. This is particularly challenging in patients who cannot produce audible speech, as ground truth with which to pinpoint neural activity synchronized with speech is not available.. In this study, we present a new iterative algorithm for neural voice activity detection (nVAD) called iterative alignment discovery dynamic time warping (IAD-DTW) that integrates DTW into the loss function of a deep neural network (DNN). The algorithm is designed to discover the alignment between a patient's electrocorticographic (ECoG) neural responses and their attempts to speak during collection of data for training BCI decoders for speech synthesis and recognition.. To demonstrate the effectiveness of the algorithm, we tested its accuracy in predicting the onset and duration of acoustic signals produced by able-bodied patients with intact speech undergoing short-term diagnostic ECoG recordings for epilepsy surgery. We simulated a lack of ground truth by randomly perturbing the temporal correspondence between neural activity and an initial single estimate for all speech onsets and durations. We examined the model's ability to overcome these perturbations to estimate ground truth. IAD-DTW showed no notable degradation (<1% absolute decrease in accuracy) in performance in these simulations, even in the case of maximal misalignments between speech and silence.. IAD-DTW is computationally inexpensive and can be easily integrated into existing DNN-based nVAD approaches, as it pertains only to the final loss computation. This approach makes it possible to train speech BCI algorithms using ECoG data from patients who are unable to produce audible speech, including those with Locked-In Syndrome.

摘要

脑机接口（BCIs）有可能在神经障碍患者中保留或恢复言语功能，这些患者的肌肉参与言语产生的能力减弱。然而，成功训练低延迟语音合成和识别模型需要将神经活动与预期的语音或声学输出对齐，具有高精度的时间精度。对于无法产生可听语音的患者来说，这尤其具有挑战性，因为没有可用于精确定位与语音同步的神经活动的真实数据。在这项研究中，我们提出了一种新的神经语音活动检测（nVAD）迭代算法，称为迭代对齐发现动态时间规整（IAD-DTW），它将 DTW 集成到深度神经网络（DNN）的损失函数中。该算法旨在发现患者脑电图（ECoG）神经反应与他们在收集数据时试图说话之间的对齐，这些数据用于训练语音合成和识别的 BCI 解码器。为了证明该算法的有效性，我们测试了其在预测有能力的患者的声学信号起始和持续时间方面的准确性，这些患者在进行癫痫手术的短期诊断性 ECoG 记录期间言语完整。我们通过随机改变神经活动与所有语音起始和持续时间的初始单个估计之间的时间对应关系，模拟了缺乏真实数据的情况。我们研究了该模型克服这些干扰以估计真实数据的能力。即使在语音和静音之间存在最大的不对准情况下，IAD-DTW 在这些模拟中也没有明显的性能下降（准确性降低<1%）。IAD-DTW 计算成本低廉，并且可以轻松集成到现有的基于 DNN 的 nVAD 方法中，因为它仅涉及最终的损失计算。这种方法使得使用无法产生可听语音的患者（包括闭锁综合征患者）的 ECoG 数据来训练语音 BCI 算法成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9eb1/11351572/77d2b7bb0dd1/jnead663cf1_hr.jpg

相似文献

Iterative alignment discovery of speech-associated neural activity.语音相关神经活动的迭代对齐发现。

J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c.

Stability of ECoG high gamma signals during speech and implications for a speech BCI system in an individual with ALS: a year-long longitudinal study.脑电高 gamma 信号在言语期间的稳定性及其对 ALS 个体言语脑机接口系统的影响：一项为期一年的纵向研究。

J Neural Eng. 2024 Jul 12;21(4). doi: 10.1088/1741-2552/ad5c02.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。

Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.

Towards real time efficient and robust ECoG decoding for mobile brain-computer interface.迈向用于移动脑机接口的实时高效且稳健的脑电信号解码

J Neural Eng. 2025 Jul 10;22(4). doi: 10.1088/1741-2552/ade917.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

Lamotrigine versus carbamazepine monotherapy for epilepsy: an individual participant data review.拉莫三嗪与卡马西平单药治疗癫痫的疗效比较：个体参与者数据回顾

Cochrane Database Syst Rev. 2018 Jun 28;6(6):CD001031. doi: 10.1002/14651858.CD001031.pub4.

Carbamazepine versus phenytoin monotherapy for epilepsy: an individual participant data review.卡马西平与苯妥英钠单药治疗癫痫：个体参与者数据回顾

Cochrane Database Syst Rev. 2017 Feb 27;2(2):CD001911. doi: 10.1002/14651858.CD001911.pub3.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

引用本文的文献

Real-time detection of spoken speech from unlabeled ECoG signals: A pilot study with an ALS participant.从未标记的脑皮层电图信号中实时检测语音：对一名肌萎缩侧索硬化症患者的初步研究。

medRxiv. 2024 Sep 22:2024.09.18.24313755. doi: 10.1101/2024.09.18.24313755.

本文引用的文献

Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months.稳定解码语音脑机接口可使 ALS 患者无需重新校准即可进行 3 个月的控制。

Adv Sci (Weinh). 2023 Dec;10(35):e2304853. doi: 10.1002/advs.202304853. Epub 2023 Oct 24.

A high-performance neuroprosthesis for speech decoding and avatar control.一种用于语音解码和化身控制的高性能神经假体。

Nature. 2023 Aug;620(7976):1037-1046. doi: 10.1038/s41586-023-06443-4. Epub 2023 Aug 23.

A high-performance speech neuroprosthesis.高性能言语神经假体

Nature. 2023 Aug;620(7976):1031-1036. doi: 10.1038/s41586-023-06377-x. Epub 2023 Aug 23.

Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis.个体严重的肢体和言语瘫痪中使用言语神经假体实现可泛化的拼写

Nat Commun. 2022 Nov 8;13(1):6510. doi: 10.1038/s41467-022-33611-3.

Understanding how the human brain tracks emitted speech sounds to execute fluent speech production.理解人类大脑如何跟踪发出的语音声音以执行流畅的言语产生。

PLoS Biol. 2022 Feb 4;20(2):e3001533. doi: 10.1371/journal.pbio.3001533. eCollection 2022 Feb.

Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication.脑机接口：应用于语音解码和合成以增强交流。

Neurotherapeutics. 2022 Jan;19(1):263-273. doi: 10.1007/s13311-022-01190-2. Epub 2022 Jan 31.

Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.想象中的言语可以从低频率和跨频率颅内 EEG 特征中解码出来。

Nat Commun. 2022 Jan 10;13(1):48. doi: 10.1038/s41467-021-27725-3.

Zoom disrupts the rhythm of conversation.Zoom 打乱了对话的节奏。

J Exp Psychol Gen. 2022 Jun;151(6):1272-1282. doi: 10.1037/xge0001150. Epub 2021 Nov 8.

Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria.神经假体用于解码无言语症瘫痪患者的言语。

N Engl J Med. 2021 Jul 15;385(3):217-227. doi: 10.1056/NEJMoa2027540.

High-performance brain-to-text communication via handwriting.通过手写实现高性能的脑-文本通信。

Nature. 2021 May;593(7858):249-254. doi: 10.1038/s41586-021-03506-2. Epub 2021 May 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

语音相关神经活动的迭代对齐发现。

Iterative alignment discovery of speech-associated neural activity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献