Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America.
PLoS One. 2012;7(7):e40598. doi: 10.1371/journal.pone.0040598. Epub 2012 Jul 6.
The receiver operating characteristic (ROC) curve is an important tool to gauge the performance of classifiers. In certain situations of high-throughput data analysis, the data is heavily class-skewed, i.e. most features tested belong to the true negative class. In such cases, only a small portion of the ROC curve is relevant in practical terms, rendering the ROC curve and its area under the curve (AUC) insufficient for the purpose of judging classifier performance. Here we define an ROC surface (ROCS) using true positive rate (TPR), false positive rate (FPR), and true discovery rate (TDR). The ROC surface, together with the associated quantities, volume under the surface (VUS) and FDR-controlled area under the ROC curve (FCAUC), provide a useful approach for gauging classifier performance on class-skewed high-throughput data. The implementation as an R package is available at http://userwww.service.emory.edu/~tyu8/ROCS/.
受试者工作特征(ROC)曲线是评估分类器性能的重要工具。在高通量数据分析的某些情况下,数据严重偏向于某一类,即大多数测试的特征属于真正的阴性类。在这种情况下,ROC 曲线及其下面积(AUC)在实际应用中只有很小的一部分是相关的,这使得 ROC 曲线及其下面积不足以用于判断分类器的性能。在这里,我们使用真阳性率(TPR)、假阳性率(FPR)和真发现率(TDR)定义了一个 ROC 曲面(ROCS)。ROC 曲面及其相关量,曲面下的体积(VUS)和 FDR 控制的 ROC 曲线下面积(FCAUC),为在偏向于某一类的高通量数据上评估分类器的性能提供了一种有用的方法。该实现作为一个 R 包可在 http://userwww.service.emory.edu/~tyu8/ROCS/ 获得。