Suppr超能文献

用于类别不平衡数据的随机森林分位数分类器。

A Random Forests Quantile Classifier for Class Imbalanced Data.

作者信息

O'Brien Robert, Ishwaran Hemant

机构信息

Division of Biostatistics, University of Miami, Miami, FL 33136, USA.

出版信息

Pattern Recognit. 2019 Jun;90:232-249. doi: 10.1016/j.patcog.2019.01.036. Epub 2019 Jan 29.

Abstract

Extending previous work on quantile classifiers (-classifiers) we propose the *-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 * 1, where * equals the unconditional probability of observing a minority class sample. The motivation for *-classification stems from a density-based approach and leads to the useful property that the *-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the *-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply *-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to -mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

摘要

在先前关于分位数分类器(-分类器)工作的基础上进行扩展,我们针对类别不平衡问题提出了 -分类器。如果少数类条件概率超过0 * 1(其中 * 等于观察到少数类样本的无条件概率),则该分类器将一个样本分配到少数类。-分类的动机源于基于密度的方法,并导致了一个有用的特性,即 -分类器使真阳性率和真阴性率之和最大化。此外,由于该过程可以等效地表示为成本加权贝叶斯分类器,它还使加权风险最小化。由于这种双重优化,-分类器在不平衡问题中可以实现接近零的风险,同时优化真阳性率和真阴性率。我们使用随机森林来应用 *-分类。我们称之为RFQ的这种新方法在 -均值性能和变量选择方面表现优于现有技术或与之具有竞争力。还考虑了对多类不平衡设置的扩展。

相似文献

1
A Random Forests Quantile Classifier for Class Imbalanced Data.用于类别不平衡数据的随机森林分位数分类器。
Pattern Recognit. 2019 Jun;90:232-249. doi: 10.1016/j.patcog.2019.01.036. Epub 2019 Jan 29.
2
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
4
Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。
BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.
7
Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data.从不平衡数据中进行深度特征表示的成本敏感学习。
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3573-3587. doi: 10.1109/TNNLS.2017.2732482. Epub 2017 Aug 17.

引用本文的文献

本文引用的文献

2
Using random forests to diagnose aviation turbulence.使用随机森林诊断航空湍流。
Mach Learn. 2014;95(1):51-70. doi: 10.1007/s10994-013-5346-7. Epub 2013 Apr 23.
6
Random survival forests for competing risks.用于竞争风险的随机生存森林
Biostatistics. 2014 Oct;15(4):757-73. doi: 10.1093/biostatistics/kxu010. Epub 2014 Apr 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验