LIPADE, University Paris Descartes, Paris, France.
Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.
The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics?
Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results.
Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html
接收器操作特征(ROC)曲线通常用于生物医学应用中,以判断判别器在不同决策阈值下的性能。估计的 ROC 曲线取决于真阳性率(TPR)和假阳性率(FPR),关键指标是曲线下面积(AUC)。对于小样本,这些比率需要从训练数据中估计,因此自然会出现一个问题:AUC、TPR 和 FPR 的估计值与真实指标相比有多好?
通过使用数据模型的模拟研究和对真实微阵列数据的分析,我们表明:(i)对于小样本,估计和真实指标的均方根差异相当大;(ii)即使对于大样本,真实和估计指标之间也只有弱相关性;(iii)一般来说,真实指标对估计指标的回归较弱。对于分类规则,我们考虑线性判别分析、线性支持向量机(SVM)和径向基函数 SVM。对于误差估计,我们考虑替换、三种交叉验证和引导。使用重采样,我们展示了一些已发表的 ROC 结果的不可靠性。
在 http://compbio.tgen.org/paper_supp/ROC/roc.html 上有配套网站。