基于错误发现率阈值化的模型选择，优化ROC曲线下面积。

Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

作者信息

Graf Alexandra C, Bauer Peter

机构信息

Medical University of Vienna.

出版信息

Stat Appl Genet Mol Biol. 2009;8:Article31. doi: 10.2202/1544-6115.1462. Epub 2009 Jun 25.

DOI:10.2202/1544-6115.1462

PMID:19572830

Abstract

We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to combine both goals: prediction and controlled structure estimation. We show that the FDR-threshold which provides the ROC-curve with the largest area under the curve (AUC) varies largely over the different parameter constellations not known in advance. Hence, we investigated a new cross validation procedure based on the maximum rank correlation estimator to determine the optimal selection threshold. This procedure (i) allows choosing an appropriate selection criterion, (ii) provides an estimate of the FDR close to the true FDR and (iii) is simple and computationally feasible for rather moderate to small sample sizes. Low estimates of the cross validated AUC (the estimates generally being positively biased) and large estimates of the cross validated FDR may indicate a lack of sufficiently prognostic variables and/or too small sample sizes. The method is applied to an oncology dataset.

摘要

我们通过控制错误发现率（FDR）的多重检验来评估变量选择，以构建一个线性评分，用于预测高维数据中的临床结局。通过独立患者预测的受试者工作特征曲线（ROC）评估预测质量。因此，我们试图兼顾两个目标：预测和可控结构估计。我们表明，为ROC曲线提供最大曲线下面积（AUC）的FDR阈值在不同的参数组合中变化很大，而这些参数组合事先并不知晓。因此，我们研究了一种基于最大秩相关估计器的新交叉验证程序，以确定最佳选择阈值。该程序（i）允许选择合适的选择标准，（ii）提供接近真实FDR的FDR估计值，并且（iii）对于中等至小样本量而言简单且计算可行。交叉验证AUC的低估计值（估计值通常存在正偏差）和交叉验证FDR的高估计值可能表明缺乏足够的预后变量和/或样本量过小。该方法应用于一个肿瘤学数据集。

相似文献

Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

Stat Appl Genet Mol Biol. 2009;8:Article31. doi: 10.2202/1544-6115.1462. Epub 2009 Jun 25.

Work efficiency: a new criterion for comprehensive comparison and evaluation of statistical methods in large-scale identification of differentially expressed genes.

Genomics. 2011 Nov;98(5):390-9. doi: 10.1016/j.ygeno.2011.05.006. Epub 2011 Jun 30.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

False discovery rate, sensitivity and sample size for microarray studies.

Bioinformatics. 2005 Jul 1;21(13):3017-24. doi: 10.1093/bioinformatics/bti448. Epub 2005 Apr 19.

Small-sample precision of ROC-related estimates.

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

Estimating the false discovery rate using nonparametric deconvolution.

Biometrics. 2007 Sep;63(3):806-15. doi: 10.1111/j.1541-0420.2006.00736.x.

Bias in the estimation of false discovery rate in microarray studies.

Bioinformatics. 2005 Oct 15;21(20):3865-72. doi: 10.1093/bioinformatics/bti626. Epub 2005 Aug 16.

Multidimensional local false discovery rate for microarray studies.

Bioinformatics. 2006 Mar 1;22(5):556-65. doi: 10.1093/bioinformatics/btk013. Epub 2005 Dec 20.

A new parametric method based on S-distributions for computing receiver operating characteristic curves for continuous diagnostic tests.

Stat Med. 2002 May 15;21(9):1213-35. doi: 10.1002/sim.1086.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

引用本文的文献

Identification of Novel Signal Transduction, Immune Function, and Oxidative Stress Genes and Pathways by Topiramate for Treatment of Methamphetamine Dependence Based on Secondary Outcomes.

Front Psychiatry. 2017 Dec 13;8:271. doi: 10.3389/fpsyt.2017.00271. eCollection 2017.

Literature aided determination of data quality and statistical significance threshold for gene expression studies.

BMC Genomics. 2012;13 Suppl 8(Suppl 8):S23. doi: 10.1186/1471-2164-13-S8-S23. Epub 2012 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于错误发现率阈值化的模型选择，优化ROC曲线下面积。

Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

作者信息

Graf Alexandra C, Bauer Peter

机构信息

Medical University of Vienna.

出版信息

Stat Appl Genet Mol Biol. 2009;8:Article31. doi: 10.2202/1544-6115.1462. Epub 2009 Jun 25.

DOI:10.2202/1544-6115.1462

PMID:19572830

Abstract

摘要

基于错误发现率阈值化的模型选择，优化ROC曲线下面积。

Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于错误发现率阈值化的模型选择，优化ROC曲线下面积。

Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

作者信息

机构信息

出版信息

相似文献

引用本文的文献