作为高维生物学中假设检验后进行估计的替代方法的效应量收缩估计：在差异基因表达中的应用

Montazeri Zahra, Yanofsky Corey M, Bickel David R

Ottawa Institute of Systems Biology.

Stat Appl Genet Mol Biol. 2010;9:Article23. doi: 10.2202/1544-6115.1504. Epub 2010 Jun 8.

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, the sum of estimator variance and squared estimator bias. On the basis of two distinct simulation studies and data from different microarray studies, we systematically compared the performance of several estimators representing both current practice and shrinkage. We find that the threshold-based estimators usually perform worse than the maximum-likelihood estimator (MLE) and they often perform far worse as quantified by estimated mean-squared risk. By contrast, the shrinkage estimators tend to perform as well as or better than the MLE and never much worse than the MLE, as expected from what is known about shrinkage. However, a Bayesian measure of performance based on the prior information that few genes are differentially expressed indicates that hard-threshold estimators perform about as well as the local false discovery rate (FDR), the best of the shrinkage estimators studied. Based on the ability of the latter to leverage information across genes, we conclude that the use of the local-FDR estimator of the fold change instead of informal or threshold-based combinations of statistical tests and non-shrinkage estimators can be expected to substantially improve the reliability of gene prioritization at very little risk of doing so less reliably. Since the proposed replacement of post-selection estimates with shrunken estimates applies as well to other types of high-dimensional data, it could also improve the analysis of SNP data from genome-wide association studies.

对微阵列数据的分析研究主要集中在识别差异表达基因的问题上，而忽略了如何将基因差异表达的证据与差异表达程度的信息相结合的问题。因此，研究人员目前基于火山图或更常见地，根据在以任意统计显著性阈值筛选基因后对倍数变化的简单估计来对基因进行进一步研究的优先级排序。虽然前一种做法的主观和非正式性质排除了对其可靠性的量化，但后一种做法等同于使用表达比率的硬阈值估计器，而在均方误差（估计器方差与估计器偏差平方之和）方面，这种估计器的表现并不出色。基于两项不同的模拟研究以及来自不同微阵列研究的数据，我们系统地比较了代表当前做法和收缩估计的几种估计器的性能。我们发现，基于阈值的估计器通常比最大似然估计器（MLE）表现更差，并且根据估计的均方风险量化，它们往往表现得更差得多。相比之下，收缩估计器的表现往往与MLE一样好或更好，并且正如关于收缩的已知情况所预期的那样，永远不会比MLE差太多。然而，基于很少有基因差异表达的先验信息的贝叶斯性能度量表明，硬阈值估计器的表现与局部错误发现率（FDR）相当，FDR是所研究的收缩估计器中最好的。基于后者能够跨基因利用信息的能力，我们得出结论，使用倍数变化的局部FDR估计器，而不是统计检验和非收缩估计器的非正式或基于阈值组合，有望在几乎不降低可靠性风险的情况下，大幅提高基因优先级排序的可靠性。由于提议用收缩估计替换选择后的估计同样适用于其他类型的高维数据，它也可以改善全基因组关联研究中SNP数据的分析。

相似文献

Shrinkage estimation of effect sizes as an alternative to hypothesis testing followed by estimation in high-dimensional biology: applications to differential gene expression.

Stat Appl Genet Mol Biol. 2010;9:Article23. doi: 10.2202/1544-6115.1504. Epub 2010 Jun 8.

Estimators of the local false discovery rate designed for small numbers of tests.

Stat Appl Genet Mol Biol. 2012 Oct 12;11(5):4. doi: 10.1515/1544-6115.1807.

Determination of the differentially expressed genes in microarray experiments using local FDR.

BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.

Evaluation of a statistical equivalence test applied to microarray data.

J Biopharm Stat. 2010 Mar;20(2):240-66. doi: 10.1080/10543400903572738.

Empirical Bayes estimation of posterior probabilities of enrichment: a comparative study of five estimators of the local false discovery rate.

BMC Bioinformatics. 2013 Mar 6;14:87. doi: 10.1186/1471-2105-14-87.

Estimating the false discovery rate using nonparametric deconvolution.

Biometrics. 2007 Sep;63(3):806-15. doi: 10.1111/j.1541-0420.2006.00736.x.

Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42.

Improved estimation of the noncentrality parameter distribution from a large number of t-statistics, with applications to false discovery rate estimation in microarray data analysis.

Biometrics. 2012 Dec;68(4):1178-87. doi: 10.1111/j.1541-0420.2012.01764.x. Epub 2012 May 2.

Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.

BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

引用本文的文献

A novel significance score for gene selection and ranking.

Bioinformatics. 2014 Mar 15;30(6):801-7. doi: 10.1093/bioinformatics/btr671. Epub 2012 Feb 9.

Improved mean estimation and its application to diagonal discriminant analysis.

Bioinformatics. 2012 Feb 15;28(4):531-7. doi: 10.1093/bioinformatics/btr690. Epub 2011 Dec 14.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Shrinkage estimation of effect sizes as an alternative to hypothesis testing followed by estimation in high-dimensional biology: applications to differential gene expression.

Stat Appl Genet Mol Biol. 2010;9:Article23. doi: 10.2202/1544-6115.1504. Epub 2010 Jun 8.

Estimators of the local false discovery rate designed for small numbers of tests.

Stat Appl Genet Mol Biol. 2012 Oct 12;11(5):4. doi: 10.1515/1544-6115.1807.

Determination of the differentially expressed genes in microarray experiments using local FDR.

BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.

Evaluation of a statistical equivalence test applied to microarray data.

J Biopharm Stat. 2010 Mar;20(2):240-66. doi: 10.1080/10543400903572738.

Empirical Bayes estimation of posterior probabilities of enrichment: a comparative study of five estimators of the local false discovery rate.

BMC Bioinformatics. 2013 Mar 6;14:87. doi: 10.1186/1471-2105-14-87.

Estimating the false discovery rate using nonparametric deconvolution.

Biometrics. 2007 Sep;63(3):806-15. doi: 10.1111/j.1541-0420.2006.00736.x.

Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42.

Improved estimation of the noncentrality parameter distribution from a large number of t-statistics, with applications to false discovery rate estimation in microarray data analysis.

Biometrics. 2012 Dec;68(4):1178-87. doi: 10.1111/j.1541-0420.2012.01764.x. Epub 2012 May 2.

Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.

BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

引用本文的文献

A novel significance score for gene selection and ranking.

Bioinformatics. 2014 Mar 15;30(6):801-7. doi: 10.1093/bioinformatics/btr671. Epub 2012 Feb 9.

Improved mean estimation and its application to diagonal discriminant analysis.

Bioinformatics. 2012 Feb 15;28(4):531-7. doi: 10.1093/bioinformatics/btr690. Epub 2011 Dec 14.

Shrinkage estimation of effect sizes as an alternative to hypothesis testing followed by estimation in high-dimensional biology: applications to differential gene expression.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献