Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institute of Cognitive Science, Osnabrück University, Neuer Graben 29/Schloss, Osnabrück, 49074, Lower Saxony, Germany.
Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Neuropsychology and Behavior Group (GRUNECO), Faculty of Medicine, Universidad de Antioquia, 53-108, Medellin, Aranjuez, Medellin, 050010, Colombia.
Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.
Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.
机器学习(ML)在认知、计算和临床神经科学中得到了越来越多的应用。为了可靠且有效地应用 ML,需要对其细微差别和局限性有一个清晰的认识。在具有不平衡类别的数据集上训练 ML 模型是一个特别常见的问题,如果不加以适当处理,可能会产生严重的后果。考虑到神经科学 ML 用户的需求,本文对类不平衡问题进行了教学评估,并通过在(i)模拟数据和(ii)脑电图(EEG)、脑磁图(MEG)和功能磁共振成像(fMRI)记录的脑数据中系统地操纵数据不平衡比,说明了其影响。我们的结果说明了广泛使用的准确性(Acc)度量标准,该标准衡量成功预测的总体比例,随着类不平衡的增加,会产生误导性的高性能。由于 Acc 按正确预测的每类比例与类大小成比例地加权,因此它在很大程度上忽略了少数类的性能。学习系统地为多数类投票的二元分类模型将产生人为的高解码准确性,该准确性直接反映了两个类之间的不平衡,而不是任何真正可泛化的区分它们的能力。我们表明,其他评估指标,如接收器操作特征(ROC)的曲线下面积(AUC),以及不太常见的平衡准确性(BAcc)度量标准-定义为敏感性和特异性的算术平均值,为不平衡数据提供了更可靠的性能评估。我们的研究结果还突出了随机森林(RF)的稳健性,以及使用分层交叉验证和超参数优化来解决数据不平衡的好处。至关重要的是,对于寻求最小化整体分类错误的神经科学 ML 应用程序,我们建议常规使用 BAcc,在平衡数据的特定情况下,它等同于使用标准 Acc,并且很容易扩展到多类设置。重要的是,我们提出了一系列处理不平衡数据的建议,并提供了开源代码,以使神经科学界能够复制和扩展我们的观察结果,并探索处理不平衡数据的替代方法。