Suppr超能文献

利用自动语音识别技术测量从脑信号合成的语音的可懂度。

Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals.

作者信息

Varshney Suvi, Farias Dana, Brandman David M, Stavisky Sergey D, Miller Lee M

机构信息

Department of Neurological Surgery, University of California, Davis.

Computer Science Graduate Group, University of California, Davis.

出版信息

Int IEEE EMBS Conf Neural Eng. 2023 Apr;2023. doi: 10.1109/ner52421.2023.10123751. Epub 2023 May 19.

Abstract

Brain-computer interfaces (BCIs) can potentially restore lost function in patients with neurological injury. A promising new application of BCI technology has focused on speech restoration. One approach is to synthesize speech from the neural correlates of a person who cannot speak, as they attempt to do so. However, there is no established gold-standard for quantifying the quality of BCI-synthesized speech. Quantitative metrics, such as applying correlation coefficients between true and decoded speech, are not applicable to anarthric users and fail to capture intelligibility by actual human listeners; by contrast, methods involving people completing forced-choice multiple-choice questionnaires are imprecise, not practical at scale, and cannot be used as cost functions for improving speech decoding algorithms. Here, we present a deep learning-based "AI Listener" that can be used to evaluate BCI speech intelligibility objectively, rapidly, and automatically. We begin by adapting several leading Automatic Speech Recognition (ASR) deep learning models - Deepspeech, Wav2vec 2.0, and Kaldi - to suit our application. We then evaluate the performance of these ASRs on multiple speech datasets with varying levels of intelligibility, including: healthy speech, speech from people with dysarthria, and synthesized BCI speech. Our results demonstrate that the multiple-language ASR model XLSR-Wav2vec 2.0, trained to output phonemes, yields superior performance in terms of speech transcription accuracy. Notably, the AI Listener reports that several previously published BCI output datasets are not intelligible, which is consistent with human listeners.

摘要

脑机接口(BCIs)有潜力恢复神经损伤患者丧失的功能。脑机接口技术一个有前景的新应用聚焦于言语恢复。一种方法是在无法说话的人尝试说话时,根据其神经关联来合成语音。然而,对于量化脑机接口合成语音的质量,尚无既定的金标准。诸如应用真实语音与解码语音之间的相关系数等定量指标,不适用于无构音能力的用户,也无法捕捉实际人类听众的可懂度;相比之下,让人们完成强制选择多项选择题问卷的方法不够精确,无法大规模应用,且不能用作改进语音解码算法的成本函数。在此,我们提出一种基于深度学习的“人工智能听众”,可用于客观、快速且自动地评估脑机接口语音的可懂度。我们首先对几个领先的自动语音识别(ASR)深度学习模型——深度语音、Wav2vec 2.0和卡尔迪——进行调整以适应我们的应用。然后,我们在多个具有不同可懂度水平的语音数据集上评估这些自动语音识别模型的性能,这些数据集包括:健康语音、构音障碍患者的语音以及合成的脑机接口语音。我们的结果表明,经过训练以输出音素的多语言自动语音识别模型XLSR-Wav2vec 2.0在语音转录准确性方面表现出色。值得注意的是,人工智能听众报告称,之前发布的几个脑机接口输出数据集不可懂,这与人类听众的判断一致。

相似文献

1
Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals.
Int IEEE EMBS Conf Neural Eng. 2023 Apr;2023. doi: 10.1109/ner52421.2023.10123751. Epub 2023 May 19.
2
Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study.
J Med Internet Res. 2022 Oct 20;24(10):e40567. doi: 10.2196/40567.
4
The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography.
Neurotherapeutics. 2019 Jan;16(1):144-165. doi: 10.1007/s13311-018-00692-2.
5
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
6
Automatic Speech Recognition from Neural Signals: A Focused Review.
Front Neurosci. 2016 Sep 27;10:429. doi: 10.3389/fnins.2016.00429. eCollection 2016.
7
Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures.
Comput Intell Neurosci. 2022 Aug 25;2022:4473952. doi: 10.1155/2022/4473952. eCollection 2022.
9
Feedback From Automatic Speech Recognition to Elicit Clear Speech in Healthy Speakers.
Am J Speech Lang Pathol. 2023 Nov 6;32(6):2940-2959. doi: 10.1044/2023_AJSLP-23-00030. Epub 2023 Oct 12.
10
Intelligibility in Down syndrome: Effect of measurement method and listener experience.
Int J Lang Commun Disord. 2021 May;56(3):501-511. doi: 10.1111/1460-6984.12602. Epub 2021 Mar 30.

引用本文的文献

1
The speech neuroprosthesis.
Nat Rev Neurosci. 2024 Jul;25(7):473-492. doi: 10.1038/s41583-024-00819-9. Epub 2024 May 14.

本文引用的文献

1
High-performance brain-to-text communication via handwriting.
Nature. 2021 May;593(7858):249-254. doi: 10.1038/s41586-021-03506-2. Epub 2021 May 12.
2
Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus.
J Neural Eng. 2020 Nov 25;17(6):066007. doi: 10.1088/1741-2552/abbfef.
3
Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices.
Front Neurosci. 2019 Nov 22;13:1267. doi: 10.3389/fnins.2019.01267. eCollection 2019.
4
Speech synthesis from neural decoding of spoken sentences.
Nature. 2019 Apr;568(7753):493-498. doi: 10.1038/s41586-019-1119-1. Epub 2019 Apr 24.
5
A High-Performance Neural Prosthesis Incorporating Discrete State Selection With Hidden Markov Models.
IEEE Trans Biomed Eng. 2017 Apr;64(4):935-945. doi: 10.1109/TBME.2016.2582691. Epub 2016 Jun 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验