一般相关性下错误发现比例的估计

Estimation of false discovery proportion under general dependence.

作者信息

Pawitan Yudi, Calza Stefano, Ploner Alexander

机构信息

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

出版信息

Bioinformatics. 2006 Dec 15;22(24):3025-31. doi: 10.1093/bioinformatics/btl527. Epub 2006 Oct 17.

DOI:10.1093/bioinformatics/btl527

PMID:17046978

Abstract

MOTIVATION

Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations.

METHODS

We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model.

RESULTS

For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data.

AVAILABILITY

R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.

摘要

动机

由于生物学和技术原因，在基因表达数据中普遍观察到基因之间的大规模相关性。这些相关性增加了错误发现率（FDR）标准估计值的变异性。我们强调错误发现比例（FDP，而非FDR）是评估微阵列数据中差异表达的合适指标，证明相关性对FDP估计的有害影响，并提出一种考虑相关性的改进估计方法。

方法

我们使用奇异值分解分析排列下检验统计量分布的变化模式。结果表明存在一个潜在的FDR模型，该模型考虑了相关性的影响，并且在统计上更接近FDP。我们基于泊松回归模型开发了一种估计潜在FDR（ELF）的程序。

结果

对于基于真实数据集相关结构的模拟数据，我们发现ELF在估计FDP方面比标准FDR方法表现得好得多。我们展示了ELF在乳腺癌和淋巴瘤数据分析中的应用。

可用性

执行ELF的R代码可在http://www.meb.ki.se/~yudpaw获取。

相似文献

Estimation of false discovery proportion under general dependence.

Bioinformatics. 2006 Dec 15;22(24):3025-31. doi: 10.1093/bioinformatics/btl527. Epub 2006 Oct 17.

Multidimensional local false discovery rate for microarray studies.

Bioinformatics. 2006 Mar 1;22(5):556-65. doi: 10.1093/bioinformatics/btk013. Epub 2005 Dec 20.

Bias in the estimation of false discovery rate in microarray studies.

Bioinformatics. 2005 Oct 15;21(20):3865-72. doi: 10.1093/bioinformatics/bti626. Epub 2005 Aug 16.

Unequal group variances in microarray data analyses.

Bioinformatics. 2008 May 1;24(9):1168-74. doi: 10.1093/bioinformatics/btn100. Epub 2008 Mar 14.

Quick calculation for sample size while controlling false discovery rate with application to microarray analysis.

Bioinformatics. 2007 Mar 15;23(6):739-46. doi: 10.1093/bioinformatics/btl664. Epub 2007 Jan 19.

Practical FDR-based sample size calculations in microarray experiments.

Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

Robust estimation of the false discovery rate.

Bioinformatics. 2006 Aug 15;22(16):1979-87. doi: 10.1093/bioinformatics/btl328. Epub 2006 Jun 15.

Empirical Bayes screening of many p-values with applications to microarray studies.

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

Exploiting sample variability to enhance multivariate analysis of microarray data.

Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.

引用本文的文献

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.

Genome Biol. 2025 Aug 18;26(1):249. doi: 10.1186/s13059-025-03734-z.

Exact Integral Formulas for False Discovery Rate and the Variance of False Discovery Proportion.

J Proteome Res. 2024 Jun 7;23(6):2298-2305. doi: 10.1021/acs.jproteome.3c00842. Epub 2024 May 29.

Mixture prior for sparse signals with dependent covariance structure.

PLoS One. 2023 Apr 27;18(4):e0284284. doi: 10.1371/journal.pone.0284284. eCollection 2023.

fdrci: FDR confidence interval selection and adjustment for large-scale hypothesis testing.

Bioinform Adv. 2022 Jun 13;2(1):vbac047. doi: 10.1093/bioadv/vbac047. eCollection 2022.

Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.

Biomed Res Int. 2015;2015:462549. doi: 10.1155/2015/462549. Epub 2015 Aug 3.

Identification of significant features in DNA microarray data.

Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4). doi: 10.1002/wics.1260.

Empirical null distribution based modeling of multi-class differential gene expression detection.

J Appl Stat. 2013 Feb 1;40(2):347-357. doi: 10.1080/02664763.2012.743976. Epub 2012 Nov 21.

Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups.

BMC Bioinformatics. 2012;13 Suppl 13(Suppl 13):S1. doi: 10.1186/1471-2105-13-S13-S1. Epub 2012 Aug 24.

Genome-wide association studies and the genetic dissection of complex traits.

Am J Hematol. 2009 Aug;84(8):504-15. doi: 10.1002/ajh.21440.

Comments on the analysis of unbalanced microarray data.

Bioinformatics. 2009 Aug 15;25(16):2035-41. doi: 10.1093/bioinformatics/btp363. Epub 2009 Jun 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一般相关性下错误发现比例的估计

Estimation of false discovery proportion under general dependence.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY

动机

方法

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献