面对数据不均衡时性能指标使用建议

Facing Imbalanced Data Recommendations for the Use of Performance Metrics.

作者信息

Jeni László A, Cohn Jeffrey F, De La Torre Fernando

机构信息

Carnegie Mellon University, Pittsburgh, PA.

Carnegie Mellon University, Pittsburgh, PA ; University of Pittsburgh, Pittsburgh, PA,

出版信息

Int Conf Affect Comput Intell Interact Workshops. 2013;2013:245-251. doi: 10.1109/ACII.2013.47.

DOI:10.1109/ACII.2013.47

PMID:25574450

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4285355/

Abstract

Recognizing facial action units (AUs) is important for situation analysis and automated video annotation. Previous work has emphasized face tracking and registration and the choice of features classifiers. Relatively neglected is the effect of imbalanced data for action unit detection. While the machine learning community has become aware of the problem of skewed data for training classifiers, little attention has been paid to how skew may bias performance metrics. To address this question, we conducted experiments using both simulated classifiers and three major databases that differ in size, type of FACS coding, and degree of skew. We evaluated influence of skew on both threshold metrics (Accuracy, F-score, Cohen's kappa, and Krippendorf's alpha) and rank metrics (area under the receiver operating characteristic (ROC) curve and precision-recall curve). With exception of area under the ROC curve, all were attenuated by skewed distributions, in many cases, dramatically so. While ROC was unaffected by skew, precision-recall curves suggest that ROC may mask poor performance. Our findings suggest that skew is a critical factor in evaluating performance metrics. To avoid or minimize skew-biased estimates of performance, we recommend reporting skew-normalized scores along with the obtained ones.

摘要

识别面部动作单元（AUs）对于态势分析和自动视频标注至关重要。先前的工作主要强调面部跟踪与配准以及特征分类器的选择。相对被忽视的是不平衡数据对动作单元检测的影响。虽然机器学习界已经意识到训练分类器时数据倾斜的问题，但对于倾斜如何使性能指标产生偏差却很少有人关注。为了解决这个问题，我们使用模拟分类器以及三个在规模、FACS编码类型和倾斜程度上存在差异的主要数据库进行了实验。我们评估了倾斜对阈值指标（准确率、F分数、科恩卡方系数和克里彭多夫阿尔法系数）和排序指标（接收者操作特征（ROC）曲线下面积和精确率-召回率曲线）的影响。除了ROC曲线下面积外，所有指标都因倾斜分布而衰减，在许多情况下衰减程度显著。虽然ROC不受倾斜影响，但精确率-召回率曲线表明ROC可能掩盖了较差的性能。我们的研究结果表明，倾斜是评估性能指标的一个关键因素。为了避免或最小化倾斜对性能的偏差估计，我们建议在报告所获得的分数时同时报告倾斜归一化分数。

相似文献

Facing Imbalanced Data Recommendations for the Use of Performance Metrics.

Int Conf Affect Comput Intell Interact Workshops. 2013;2013:245-251. doi: 10.1109/ACII.2013.47.

Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.

Eur Radiol. 2024 Dec;34(12):7895-7903. doi: 10.1007/s00330-024-10834-0. Epub 2024 Jun 11.

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.

Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.

Tuning model parameters in class-imbalanced learning with precision-recall curve.

Biom J. 2019 May;61(3):652-664. doi: 10.1002/bimj.201800148. Epub 2018 Dec 12.

Area under precision-recall curves for weighted and unweighted data.

PLoS One. 2014 Mar 20;9(3):e92209. doi: 10.1371/journal.pone.0092209. eCollection 2014.

Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania.

BMC Pregnancy Childbirth. 2022 Apr 1;22(1):275. doi: 10.1186/s12884-022-04534-0.

Automated Pain Detection in Facial Videos of Children using Human-Assisted Transfer Learning.

CEUR Workshop Proc. 2018 Jul;2142:10-21.

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.

Conversion of adverse data corpus to shrewd output using sampling metrics.

Vis Comput Ind Biomed Art. 2020 Aug 11;3(1):19. doi: 10.1186/s42492-020-00055-9.

Federated Learning on Clinical Benchmark Data: Performance Assessment.

J Med Internet Res. 2020 Oct 26;22(10):e20891. doi: 10.2196/20891.

引用本文的文献

Classification and predictive models using supervised machine learning: A conceptual review.

South Afr J Crit Care. 2025 May 19;41(1):e2937. doi: 10.7196/SAJCC.2025.v411.2937. eCollection 2025.

Using Wearable Sensors to Identify Home and Community-Based Movement Using Continuous and Straight Line Stepping Time.

Sensors (Basel). 2025 Aug 12;25(16):4979. doi: 10.3390/s25164979.

A Pattern Combining the Cognitive and Physical Risks Predicts Frailty Reversal in Community-Dwelling Older Individuals 3 years Later: A Decision Tree Analysis.

Sage Open Aging. 2025 Aug 20;11:30495334251365595. doi: 10.1177/30495334251365595. eCollection 2025 Jan-Dec.

Machine learning approaches to predicting medication nonadherence: a scoping review.

Int J Med Inform. 2025 Aug 14;204:106082. doi: 10.1016/j.ijmedinf.2025.106082.

A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting.

Stat Med. 2025 Aug;44(18-19):e70227. doi: 10.1002/sim.70227.

Advanced feature engineering in Acute:Chronic Workload Ratio (ACWR) calculation for injury forecasting in elite soccer.

PLoS One. 2025 Jul 23;20(7):e0327960. doi: 10.1371/journal.pone.0327960. eCollection 2025.

Integrating multi-omics and machine learning for disease resistance prediction in legumes.

Theor Appl Genet. 2025 Jun 27;138(7):163. doi: 10.1007/s00122-025-04948-2.

Identification of medicinal plant parts using depth-wise separable convolutional neural network.

PLoS One. 2025 May 7;20(5):e0322936. doi: 10.1371/journal.pone.0322936. eCollection 2025.

Navigating the Multiverse: a Hitchhiker's guide to selecting harmonization methods for multimodal biomedical data.

Biol Methods Protoc. 2025 Apr 17;10(1):bpaf028. doi: 10.1093/biomethods/bpaf028. eCollection 2025.

Development and Validation of a Machine Learning Model for Early Prediction of Delirium in Intensive Care Units Using Continuous Physiological Data: Retrospective Study.

J Med Internet Res. 2025 Apr 2;27:e59520. doi: 10.2196/59520.

本文引用的文献

The Painful Face - Pain Expression Recognition Using Active Appearance Models.

Image Vis Comput. 2009 Oct;27(12):1788-1796. doi: 10.1016/j.imavis.2009.05.007.

Alcohol and group formation: a multimodal investigation of the effects of alcohol on emotion and social bonding.

Psychol Sci. 2012 Aug 1;23(8):869-78. doi: 10.1177/0956797611435134. Epub 2012 Jul 3.

Subspace-based support vector machines for pattern classification.

Neural Netw. 2009 Jul-Aug;22(5-6):558-67. doi: 10.1016/j.neunet.2009.06.026. Epub 2009 Jul 2.

SVMs modeling for highly imbalanced classification.

IEEE Trans Syst Man Cybern B Cybern. 2009 Feb;39(1):281-8. doi: 10.1109/TSMCB.2008.2002909. Epub 2008 Dec 9.

A survey of affect recognition methods: audio, visual, and spontaneous expressions.

IEEE Trans Pattern Anal Mach Intell. 2009 Jan;31(1):39-58. doi: 10.1109/TPAMI.2008.52.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

面对数据不均衡时性能指标使用建议

Facing Imbalanced Data Recommendations for the Use of Performance Metrics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献