ROC 相关估计的小样本精度。

Small-sample precision of ROC-related estimates.

机构信息

LIPADE, University Paris Descartes, Paris, France.

出版信息

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

DOI:10.1093/bioinformatics/btq037

PMID:20130029

Abstract

MOTIVATION

The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics?

RESULTS

Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results.

AVAILABILITY

Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html

CONTACT

edward@mail.ece.tamu.edu.

摘要

动机

接收器操作特征（ROC）曲线通常用于生物医学应用中，以判断判别器在不同决策阈值下的性能。估计的 ROC 曲线取决于真阳性率（TPR）和假阳性率（FPR），关键指标是曲线下面积（AUC）。对于小样本，这些比率需要从训练数据中估计，因此自然会出现一个问题：AUC、TPR 和 FPR 的估计值与真实指标相比有多好？

结果

通过使用数据模型的模拟研究和对真实微阵列数据的分析，我们表明：（i）对于小样本，估计和真实指标的均方根差异相当大；（ii）即使对于大样本，真实和估计指标之间也只有弱相关性；（iii）一般来说，真实指标对估计指标的回归较弱。对于分类规则，我们考虑线性判别分析、线性支持向量机（SVM）和径向基函数 SVM。对于误差估计，我们考虑替换、三种交叉验证和引导。使用重采样，我们展示了一些已发表的 ROC 结果的不可靠性。

可用性

在 http://compbio.tgen.org/paper_supp/ROC/roc.html 上有配套网站。

联系人

edward@mail.ece.tamu.edu。

相似文献

Small-sample precision of ROC-related estimates.

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

Optimal number of features as a function of sample size for various classification rules.

Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.

Reporting bias when using real data sets to analyze classification performance.

Bioinformatics. 2010 Jan 1;26(1):68-76. doi: 10.1093/bioinformatics/btp605. Epub 2009 Oct 21.

What should be expected from feature selection in small-sample settings.

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

Genetic test bed for feature selection.

Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.

Prediction error estimation: a comparison of resampling methods.

Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.

On linear combinations of dichotomizers for maximizing the area under the ROC curve.

IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):610-20. doi: 10.1109/TSMCB.2010.2060325. Epub 2010 Aug 30.

Classification based upon gene expression data: bias and precision of error rates.

Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.

Support vector machines and other pattern recognition approaches to the diagnosis of cerebral palsy gait.

IEEE Trans Biomed Eng. 2006 Dec;53(12 Pt 1):2479-90. doi: 10.1109/TBME.2006.883697.

引用本文的文献

Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf318.

Frequency domain analysis of steady-state visual evoked potentials in dogs with optic neuritis: a pilot study.

Front Vet Sci. 2025 Jun 24;12:1603620. doi: 10.3389/fvets.2025.1603620. eCollection 2025.

Curvature estimation techniques for advancing neurodegenerative disease analysis: a systematic review of machine learning and deep learning approaches.

Am J Neurodegener Dis. 2025 Feb 25;14(1):1-33. doi: 10.62347/DZNQ2482. eCollection 2025.

Protein Profiles Predict Treatment Responses to the PI3K Inhibitor Umbralisib in Patients with Chronic Lymphocytic Leukemia.

Clin Cancer Res. 2025 May 15;31(10):1943-1955. doi: 10.1158/1078-0432.CCR-24-2911.

Comparison of ROX index with modified indices incorporating heart rate, flow rate, and PaO/FiO ratio for early prediction of outcomes among patients initiated on post-extubation high-flow nasal cannula therapy.

Eur J Med Res. 2025 Mar 14;30(1):166. doi: 10.1186/s40001-025-02402-z.

Assessments of lung nodules by an artificial intelligence chatbot using longitudinal CT images.

Cell Rep Med. 2025 Mar 18;6(3):101988. doi: 10.1016/j.xcrm.2025.101988. Epub 2025 Mar 4.

Machine learning predictive model for lumbar disc reherniation following microsurgical discectomy.

Brain Spine. 2024 Oct 10;4:103918. doi: 10.1016/j.bas.2024.103918. eCollection 2024.

Capturing biomarkers associated with Alzheimer disease subtypes using data distribution characteristics.

Front Comput Neurosci. 2024 Sep 3;18:1388504. doi: 10.3389/fncom.2024.1388504. eCollection 2024.

A microRNA diagnostic biomarker for amyotrophic lateral sclerosis.

Brain Commun. 2024 Sep 13;6(5):fcae268. doi: 10.1093/braincomms/fcae268. eCollection 2024.

Clinical cut-off scores for the Borderline Personality Features Scale for Children to differentiate among adolescents with Borderline Personality Disorder, other psychopathology, and no psychopathology: a replication study.

Borderline Personal Disord Emot Dysregul. 2024 Aug 26;11(1):21. doi: 10.1186/s40479-024-00264-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ROC 相关估计的小样本精度。

Small-sample precision of ROC-related estimates.

机构信息

LIPADE, University Paris Descartes, Paris, France.

出版信息

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

DOI:10.1093/bioinformatics/btq037

PMID:20130029

Abstract

MOTIVATION

RESULTS

AVAILABILITY

Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html

CONTACT

edward@mail.ece.tamu.edu.

摘要

动机

结果

可用性

在 http://compbio.tgen.org/paper_supp/ROC/roc.html 上有配套网站。

联系人

edward@mail.ece.tamu.edu。

ROC 相关估计的小样本精度。

Small-sample precision of ROC-related estimates.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系人

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

ROC 相关估计的小样本精度。

Small-sample precision of ROC-related estimates.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系人

相似文献

引用本文的文献