• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ROC 相关估计的小样本精度。

Small-sample precision of ROC-related estimates.

机构信息

LIPADE, University Paris Descartes, Paris, France.

出版信息

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

DOI:10.1093/bioinformatics/btq037
PMID:20130029
Abstract

MOTIVATION

The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics?

RESULTS

Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results.

AVAILABILITY

Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html

CONTACT

edward@mail.ece.tamu.edu.

摘要

动机

接收器操作特征(ROC)曲线通常用于生物医学应用中,以判断判别器在不同决策阈值下的性能。估计的 ROC 曲线取决于真阳性率(TPR)和假阳性率(FPR),关键指标是曲线下面积(AUC)。对于小样本,这些比率需要从训练数据中估计,因此自然会出现一个问题:AUC、TPR 和 FPR 的估计值与真实指标相比有多好?

结果

通过使用数据模型的模拟研究和对真实微阵列数据的分析,我们表明:(i)对于小样本,估计和真实指标的均方根差异相当大;(ii)即使对于大样本,真实和估计指标之间也只有弱相关性;(iii)一般来说,真实指标对估计指标的回归较弱。对于分类规则,我们考虑线性判别分析、线性支持向量机(SVM)和径向基函数 SVM。对于误差估计,我们考虑替换、三种交叉验证和引导。使用重采样,我们展示了一些已发表的 ROC 结果的不可靠性。

可用性

在 http://compbio.tgen.org/paper_supp/ROC/roc.html 上有配套网站。

联系人

edward@mail.ece.tamu.edu。

相似文献

1
Small-sample precision of ROC-related estimates.ROC 相关估计的小样本精度。
Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.
2
Optimal number of features as a function of sample size for various classification rules.针对各种分类规则,作为样本大小函数的最优特征数量。
Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.
3
Reporting bias when using real data sets to analyze classification performance.使用真实数据集分析分类性能时的报告偏倚。
Bioinformatics. 2010 Jan 1;26(1):68-76. doi: 10.1093/bioinformatics/btp605. Epub 2009 Oct 21.
4
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
5
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
6
Genetic test bed for feature selection.用于特征选择的基因测试平台。
Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.
7
Prediction error estimation: a comparison of resampling methods.预测误差估计:重采样方法的比较
Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.
8
On linear combinations of dichotomizers for maximizing the area under the ROC curve.关于二分器的线性组合以最大化ROC曲线下面积
IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):610-20. doi: 10.1109/TSMCB.2010.2060325. Epub 2010 Aug 30.
9
Classification based upon gene expression data: bias and precision of error rates.基于基因表达数据的分类:错误率的偏差与精度
Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.
10
Support vector machines and other pattern recognition approaches to the diagnosis of cerebral palsy gait.支持向量机及其他用于脑瘫步态诊断的模式识别方法。
IEEE Trans Biomed Eng. 2006 Dec;53(12 Pt 1):2479-90. doi: 10.1109/TBME.2006.883697.

引用本文的文献

1
Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery.评估随机森林的自再现性以发现最佳的短生物标志物特征。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf318.
2
Frequency domain analysis of steady-state visual evoked potentials in dogs with optic neuritis: a pilot study.视神经炎犬稳态视觉诱发电位的频域分析:一项初步研究。
Front Vet Sci. 2025 Jun 24;12:1603620. doi: 10.3389/fvets.2025.1603620. eCollection 2025.
3
Curvature estimation techniques for advancing neurodegenerative disease analysis: a systematic review of machine learning and deep learning approaches.
用于推进神经退行性疾病分析的曲率估计技术:机器学习和深度学习方法的系统综述
Am J Neurodegener Dis. 2025 Feb 25;14(1):1-33. doi: 10.62347/DZNQ2482. eCollection 2025.
4
Protein Profiles Predict Treatment Responses to the PI3K Inhibitor Umbralisib in Patients with Chronic Lymphocytic Leukemia.蛋白质谱可预测慢性淋巴细胞白血病患者对PI3K抑制剂乌布利西布的治疗反应。
Clin Cancer Res. 2025 May 15;31(10):1943-1955. doi: 10.1158/1078-0432.CCR-24-2911.
5
Comparison of ROX index with modified indices incorporating heart rate, flow rate, and PaO/FiO ratio for early prediction of outcomes among patients initiated on post-extubation high-flow nasal cannula therapy.比较ROX指数与纳入心率、流速和动脉血氧分压/吸入氧浓度比的改良指数,用于拔管后高流量鼻导管治疗患者结局的早期预测。
Eur J Med Res. 2025 Mar 14;30(1):166. doi: 10.1186/s40001-025-02402-z.
6
Assessments of lung nodules by an artificial intelligence chatbot using longitudinal CT images.使用纵向CT图像通过人工智能聊天机器人对肺结节进行评估。
Cell Rep Med. 2025 Mar 18;6(3):101988. doi: 10.1016/j.xcrm.2025.101988. Epub 2025 Mar 4.
7
Machine learning predictive model for lumbar disc reherniation following microsurgical discectomy.显微椎间盘切除术后腰椎间盘再突出的机器学习预测模型
Brain Spine. 2024 Oct 10;4:103918. doi: 10.1016/j.bas.2024.103918. eCollection 2024.
8
Capturing biomarkers associated with Alzheimer disease subtypes using data distribution characteristics.利用数据分布特征捕获与阿尔茨海默病亚型相关的生物标志物。
Front Comput Neurosci. 2024 Sep 3;18:1388504. doi: 10.3389/fncom.2024.1388504. eCollection 2024.
9
A microRNA diagnostic biomarker for amyotrophic lateral sclerosis.一种用于肌萎缩侧索硬化症的微小RNA诊断生物标志物。
Brain Commun. 2024 Sep 13;6(5):fcae268. doi: 10.1093/braincomms/fcae268. eCollection 2024.
10
Clinical cut-off scores for the Borderline Personality Features Scale for Children to differentiate among adolescents with Borderline Personality Disorder, other psychopathology, and no psychopathology: a replication study.儿童边缘型人格特征量表的临床临界分数,用于区分边缘型人格障碍青少年、其他精神病理学情况青少年和无精神病理学情况青少年:一项重复研究
Borderline Personal Disord Emot Dysregul. 2024 Aug 26;11(1):21. doi: 10.1186/s40479-024-00264-1.