IEEE Trans Neural Syst Rehabil Eng. 2024;32:3432-3441. doi: 10.1109/TNSRE.2024.3457313. Epub 2024 Sep 18.
Speech brain-computer interfaces (speech BCIs), which convert brain signals into spoken words or sentences, have demonstrated great potential for high-performance BCI communication. Phonemes are the basic pronunciation units. For monosyllabic languages such as Chinese Mandarin, where a word usually contains less than three phonemes, accurate decoding of phonemes plays a vital role. We found that in the neural representation space, phonemes with similar pronunciations are often inseparable, leading to confusion in phoneme classification.
We mapped the neural signals of phoneme pronunciation into a hyperbolic space for a more distinct phoneme representation. Critically, we proposed a hyperbolic hierarchical clustering approach to specifically learn a phoneme-level structure to guide the representation.
We found such representation facilitated greater distance between similar phonemes, effectively reducing confusion. In the phoneme decoding task, our approach demonstrated an average accuracy of 75.21% for 21 phonemes and outperformed existing methods across different experimental days.
Our approach showed high accuracy in phoneme classification. By learning the phoneme-level neural structure, the representations of neural signals were more discriminative and interpretable.
Our approach can potentially facilitate high-performance speech BCIs for Chinese and other monosyllabic languages.
语音脑-机接口(speech BCI)将脑信号转换为语音或句子,在高性能 BCI 通信中显示出巨大的潜力。音位是基本的发音单位。对于像汉语普通话这样的单音节语言,一个词通常包含不到三个音位,因此准确解码音位至关重要。我们发现,在神经表示空间中,发音相似的音位往往难以区分,导致音位分类混淆。
我们将音位发音的神经信号映射到双曲空间中,以获得更清晰的音位表示。关键是,我们提出了一种双曲层次聚类方法,专门学习音位级别的结构来指导表示。
我们发现这种表示方法促进了相似音位之间的更大距离,有效地减少了混淆。在音位解码任务中,我们的方法在 21 个音位上的平均准确率为 75.21%,在不同的实验日优于现有方法。
我们的方法在音位分类中表现出很高的准确性。通过学习音位级别的神经结构,神经信号的表示更加具有辨别力和可解释性。
我们的方法有可能促进汉语和其他单音节语言的高性能语音 BCI。