• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不要被类别不平衡问题困扰:选择合适的分类器和性能指标,对不平衡数据进行脑解码。

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.

机构信息

Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institute of Cognitive Science, Osnabrück University, Neuer Graben 29/Schloss, Osnabrück, 49074, Lower Saxony, Germany.

Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Neuropsychology and Behavior Group (GRUNECO), Faculty of Medicine, Universidad de Antioquia, 53-108, Medellin, Aranjuez, Medellin, 050010, Colombia.

出版信息

Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.

DOI:10.1016/j.neuroimage.2023.120253
PMID:37385392
Abstract

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

摘要

机器学习(ML)在认知、计算和临床神经科学中得到了越来越多的应用。为了可靠且有效地应用 ML,需要对其细微差别和局限性有一个清晰的认识。在具有不平衡类别的数据集上训练 ML 模型是一个特别常见的问题,如果不加以适当处理,可能会产生严重的后果。考虑到神经科学 ML 用户的需求,本文对类不平衡问题进行了教学评估,并通过在(i)模拟数据和(ii)脑电图(EEG)、脑磁图(MEG)和功能磁共振成像(fMRI)记录的脑数据中系统地操纵数据不平衡比,说明了其影响。我们的结果说明了广泛使用的准确性(Acc)度量标准,该标准衡量成功预测的总体比例,随着类不平衡的增加,会产生误导性的高性能。由于 Acc 按正确预测的每类比例与类大小成比例地加权,因此它在很大程度上忽略了少数类的性能。学习系统地为多数类投票的二元分类模型将产生人为的高解码准确性,该准确性直接反映了两个类之间的不平衡,而不是任何真正可泛化的区分它们的能力。我们表明,其他评估指标,如接收器操作特征(ROC)的曲线下面积(AUC),以及不太常见的平衡准确性(BAcc)度量标准-定义为敏感性和特异性的算术平均值,为不平衡数据提供了更可靠的性能评估。我们的研究结果还突出了随机森林(RF)的稳健性,以及使用分层交叉验证和超参数优化来解决数据不平衡的好处。至关重要的是,对于寻求最小化整体分类错误的神经科学 ML 应用程序,我们建议常规使用 BAcc,在平衡数据的特定情况下,它等同于使用标准 Acc,并且很容易扩展到多类设置。重要的是,我们提出了一系列处理不平衡数据的建议,并提供了开源代码,以使神经科学界能够复制和扩展我们的观察结果,并探索处理不平衡数据的替代方法。

相似文献

1
Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.不要被类别不平衡问题困扰:选择合适的分类器和性能指标,对不平衡数据进行脑解码。
Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.
2
Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy.偶然超过机遇水平:脑信号分类中理论机遇水平的注意事项及解码准确性的统计评估
J Neurosci Methods. 2015 Jul 30;250:126-36. doi: 10.1016/j.jneumeth.2015.01.010. Epub 2015 Jan 14.
3
Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.医学图像分类中的不平衡问题:提高判别和校准性能的评估实践
Eur Radiol. 2024 Dec;34(12):7895-7903. doi: 10.1007/s00330-024-10834-0. Epub 2024 Jun 11.
4
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.评估和缓解机器学习中类不平衡的影响及其在 X 射线成像中的应用。
Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2041-2048. doi: 10.1007/s11548-020-02260-6. Epub 2020 Sep 23.
5
Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。
BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.
6
Inverse free reduced universum twin support vector machine for imbalanced data classification.用于不平衡数据分类的逆自由约简全域孪生支持向量机
Neural Netw. 2023 Jan;157:125-135. doi: 10.1016/j.neunet.2022.10.003. Epub 2022 Oct 15.
7
Post-boosting of classification boundary for imbalanced data using geometric mean.使用几何平均值对不平衡数据进行分类边界后提升。
Neural Netw. 2017 Dec;96:101-114. doi: 10.1016/j.neunet.2017.09.004. Epub 2017 Sep 14.
8
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.机器学习中不平衡数据集的重采样技术比较:在局灶性癫痫患者发作间期颅内脑电图记录的致痫区定位中的应用
Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.
9
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
10
A systematic study of the class imbalance problem in convolutional neural networks.卷积神经网络中类不平衡问题的系统研究。
Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29.

引用本文的文献

1
Computational Mechanisms of Approach-Avoidance Conflict Predictively Differentiate Between Affective and Substance Use Disorders.趋近-回避冲突的计算机制可预测区分情感障碍和物质使用障碍。
Comput Psychiatr. 2025 Sep 5;9(1):159-186. doi: 10.5334/cpsy.131. eCollection 2025.
2
Preoperatively predicting failure to achieve the minimum clinically important difference and substantial clinical benefit for total knee arthroplasty patients using machine learning.使用机器学习对全膝关节置换术患者术前预测未能达到最小临床重要差异和实质性临床获益的情况。
Knee Surg Relat Res. 2025 Sep 10;37(1):37. doi: 10.1186/s43019-025-00289-y.
3
Predicting suicidality in people living with HIV in Uganda: a machine learning approach.
预测乌干达艾滋病病毒感染者的自杀倾向:一种机器学习方法。
Front Psychiatry. 2025 Aug 15;16:1584335. doi: 10.3389/fpsyt.2025.1584335. eCollection 2025.
4
Comparison of machine learning models for mucopolysaccharidosis early diagnosis using UAE medical records.使用阿联酋医疗记录的机器学习模型在黏多糖贮积症早期诊断中的比较
Sci Rep. 2025 Aug 6;15(1):28813. doi: 10.1038/s41598-025-13879-3.
5
Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.用于传染病监测、诊断和预后的机器学习与人工智能
Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.
6
HERGAI: an artificial intelligence tool for structure-based prediction of hERG inhibitors.HERGAI:一种基于结构预测hERG抑制剂的人工智能工具。
J Cheminform. 2025 Jul 24;17(1):110. doi: 10.1186/s13321-025-01063-8.
7
Predicting patient risk of leaving without being seen using machine learning: a retrospective study in a single overcrowded emergency department.使用机器学习预测患者未就诊即离开的风险:在一个过度拥挤的急诊科进行的回顾性研究
BMC Emerg Med. 2025 Jul 15;25(1):121. doi: 10.1186/s12873-025-01287-9.
8
Towards collaborative data science in mental health research: The ECNP neuroimaging network accessible data repository.迈向精神卫生研究中的协作数据科学:欧洲神经精神药理学会神经影像学网络可访问数据存储库。
Neurosci Appl. 2024 Dec 9;4:105407. doi: 10.1016/j.nsa.2024.105407. eCollection 2025.
9
How EEG preprocessing shapes decoding performance.脑电图预处理如何塑造解码性能。
Commun Biol. 2025 Jul 10;8(1):1039. doi: 10.1038/s42003-025-08464-3.
10
Comprehensive plant health monitoring: expert-level assessment with spatio-temporal image data.全面的植物健康监测:利用时空图像数据进行专家级评估。
Front Plant Sci. 2025 May 30;16:1511651. doi: 10.3389/fpls.2025.1511651. eCollection 2025.