Schroeder Felix, Fairclough Stephen, Dehais Frederic, Richins Matthew
School of Psychology, Liverpool John Moores University, Liverpool, United Kingdom.
Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Université de Toulouse, Toulouse, France.
Front Neuroergon. 2025 Jul 1;6:1582724. doi: 10.3389/fnrgo.2025.1582724. eCollection 2025.
Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.
神经自适应技术是一种被动式脑机接口(pBCI),旨在通过监测神经生理信号,将隐含的用户状态信息纳入人机交互之中。评估机器学习和信号处理方法是神经自适应技术研究的一个核心方面。这些评估通常在受控的实验室环境中离线进行,这样可以进行详尽的分析。然而,已表明离线评估分类器的方式会影响所报告的准确率水平,可能使结论产生偏差。在当前的研究中,我们调查了其中一种偏差来源,即交叉验证方案的选择,而这一点往往没有得到足够详细的报告。通过三个独立的脑电图(EEG)n-back数据集以及74名参与者,我们展示了基于相同数据的指标和结论如何因不同的交叉验证选择而有所不同。对交叉验证方案进行比较,其中训练集和测试集子集边界是否遵循数据收集的块结构,结果表明分类器的相对性能会因所使用的评估方法而有显著差异。通过计算跨数据集差异的自抽样95%置信区间,我们发现黎曼最小距离(RMDM)分类器的分类准确率差异可能高达12.7%,而基于滤波器组公共空间模式(FBCSP)的线性判别分析(LDA)的分类准确率差异可能高达30.4%。这些交叉验证实现方式之间的差异可能会影响研究论文中呈现的结论,这可能会使促进可重复性的努力变得复杂。我们的结果例证了为何详细报告数据分割程序应成为惯例。