O'Brien Robert, Ishwaran Hemant
Division of Biostatistics, University of Miami, Miami, FL 33136, USA.
Pattern Recognit. 2019 Jun;90:232-249. doi: 10.1016/j.patcog.2019.01.036. Epub 2019 Jan 29.
Extending previous work on quantile classifiers (-classifiers) we propose the *-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 * 1, where * equals the unconditional probability of observing a minority class sample. The motivation for *-classification stems from a density-based approach and leads to the useful property that the *-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the *-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply *-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to -mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.
在先前关于分位数分类器(-分类器)工作的基础上进行扩展,我们针对类别不平衡问题提出了 -分类器。如果少数类条件概率超过0 * 1(其中 * 等于观察到少数类样本的无条件概率),则该分类器将一个样本分配到少数类。-分类的动机源于基于密度的方法,并导致了一个有用的特性,即 -分类器使真阳性率和真阴性率之和最大化。此外,由于该过程可以等效地表示为成本加权贝叶斯分类器,它还使加权风险最小化。由于这种双重优化,-分类器在不平衡问题中可以实现接近零的风险,同时优化真阳性率和真阴性率。我们使用随机森林来应用 *-分类。我们称之为RFQ的这种新方法在 -均值性能和变量选择方面表现优于现有技术或与之具有竞争力。还考虑了对多类不平衡设置的扩展。