Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai 200040, China; Shanghai Key Laboratory of Brain Function Restoration and Neural Regeneration, Shanghai 200040, China; National Center for Neurological Disorders, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai 200040, China.
School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China; State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai 201210, China.
Cell Rep. 2024 Nov 26;43(11):114924. doi: 10.1016/j.celrep.2024.114924. Epub 2024 Oct 31.
Speech brain-computer interfaces (BCIs) directly translate brain activity into speech sound and text. Despite successful applications in non-tonal languages, the distinct syllabic structures and pivotal lexical information conveyed through tonal nuances present challenges in BCI decoding for tonal languages like Mandarin Chinese. Here, we designed a brain-to-text framework to decode Mandarin sentences from invasive neural recordings. Our framework dissects speech onset, base syllables, and lexical tones, integrating them with contextual information through Bayesian likelihood and a Viterbi decoder. The results demonstrate accurate tone and syllable decoding during naturalistic speech production. The overall word error rate (WER) for 10 offline-decoded tonal sentences with a vocabulary of 40 high-frequency Chinese characters is 21% (chance: 95.3%) averaged across five participants, and tone decoding accuracy reaches 93% (chance: 25%), surpassing previous intracranial Mandarin tonal syllable decoders. This study provides a robust and generalizable approach for brain-to-text decoding of continuous tonal speech sentences.
语音脑-机接口(BCI)直接将脑活动转化为语音和文本。尽管在非声调语言中成功应用,但声调语言(如普通话)独特的音节结构和通过声调细微差别传达的关键词汇信息给 BCI 解码带来了挑战。在这里,我们设计了一个从侵入性神经记录中解码汉语句子的脑到文本框架。我们的框架分解了语音起始、基础音节和词汇声调,并通过贝叶斯似然和维特比解码器将它们与上下文信息结合起来。结果表明,在自然语言产生过程中可以准确地解码声调。在五位参与者的平均情况下,对词汇量为 40 个高频汉字的 10 个离线解码声调句子的整体单词错误率(WER)为 21%(机会:95.3%),声调解码准确率达到 93%(机会:25%),超过了以前的颅内普通话声调音节解码器。这项研究为连续声调语音句子的脑到文本解码提供了一种稳健且可推广的方法。