Wu Xiaolong, Hu Kejia, Fu Zhichun, Zhang Dingguo
Department of Electronic and Electrical Engineering, University of Bath, Bath, United Kingdom.
Department of Neurosurgery, Center for Functional Neurosurgery, Ruijin Hospital Affiliated with Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Imaging Neurosci (Camb). 2025 Sep 10;3. doi: 10.1162/IMAG.a.146. eCollection 2025.
Brain-computer interfaces (BCIs) that reconstruct speech waveforms from neural signals are a promising communication technology. However, the field lacks a standardized evaluation metric, making it difficult to compare results across studies. Existing objective metrics, such as correlation coefficient (CC) and mel cepstral distortion (MCD), are often used inconsistently and have intrinsic limitations. This study addresses the critical need for a robust and validated method for evaluating reconstructed waveform quality. Literature about waveform reconstruction from intracranial signals is reviewed, and issues with evaluation methods are presented. We collated reconstructed audio from 10 published speech BCI studies and collected Mean Opinion Scores (MOS) from human raters to serve as a perceptual ground truth. We then systematically evaluated how well combinations of existing objective metrics (STOI and MCD) could predict these MOS scores. To ensure robustness and generalizability, we employed a rigorous leave-one-dataset-out cross-validation scheme and compared multiple models, including linear and non-linear regressors. This work, for the first time, identifies a lack of a standard evaluation method, which prohibits cross-study comparison. Using 10 public datasets, our analysis reveals that a non-linear model, specifically a Random Forest regressor, provides the most accurate and reliable prediction of subjective MOS ratings (R² = 0.892). We propose this cross-validated Random Forest model, which maps STOI and MCD to a predicted MOS score, as a standardized objective evaluation metric for the speech BCI field. Its demonstrated accuracy and robust validation outperform the available methods. Moreover, it can provide the community with a reliable tool to benchmark performance, facilitate meaningful cross-study comparisons for the first time, and accelerate progress in speech neuroprosthetics.
能够从神经信号中重建语音波形的脑机接口(BCI)是一种很有前景的通信技术。然而,该领域缺乏标准化的评估指标,这使得跨研究比较结果变得困难。现有的客观指标,如相关系数(CC)和梅尔倒谱失真(MCD),使用时常常不一致且存在内在局限性。本研究满足了对一种用于评估重建波形质量的稳健且经过验证的方法的迫切需求。回顾了关于从颅内信号进行波形重建的文献,并指出了评估方法存在的问题。我们整理了来自10项已发表的语音脑机接口研究的重建音频,并从人类评分者那里收集了平均意见得分(MOS),以作为感知的基本事实。然后,我们系统地评估了现有客观指标(短时客观可懂度测量(STOI)和MCD)的组合对这些MOS得分的预测能力。为确保稳健性和通用性,我们采用了严格的留一数据集交叉验证方案,并比较了包括线性和非线性回归器在内的多个模型。这项工作首次发现缺乏标准评估方法阻碍了跨研究比较。使用10个公共数据集,我们的分析表明,非线性模型,特别是随机森林回归器,能最准确可靠地预测主观MOS评分(R² = 0.892)。我们提出这种经过交叉验证的随机森林模型,它将STOI和MCD映射到预测的MOS得分,作为语音脑机接口领域的标准化客观评估指标。其已证明的准确性和稳健验证优于现有方法。此外,它可以为该领域提供一个可靠的性能基准工具,首次促进有意义的跨研究比较,并加速语音神经假体的进展。