Pounds Stanley B, Gao Cuilan L, Zhang Hui
St. Jude Children's Research Hospital.
Stat Appl Genet Mol Biol. 2012 Oct 19;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1773/1544-6115.1773.xml. doi: 10.1515/1544-6115.1773.
Differential expression analysis of sequence-count expression data involves performing a large number of hypothesis tests that compare the expression count data of each gene or transcript across two or more biological conditions. The assumptions of any specific hypothesis-testing method will probably not be valid for each of a very large number of genes. Thus, computational evaluation of assumptions should be incorporated into the analysis to select an appropriate hypothesis-testing method for each gene. Here, we generalize earlier work to introduce two novel procedures that use estimates of the empirical Bayesian probability (EBP) of overdispersion to select or combine results of a standard Poisson likelihood ratio test and a quasi-likelihood test for each gene. These EBP-based procedures simultaneously evaluate the Poisson-distribution assumption and account for multiple testing. With adequate power to detect overdispersion, the new procedures select the standard likelihood test for each gene with Poisson-distributed counts and the quasi-likelihood test for each gene with overdispersed counts. The new procedures outperformed previously published methods in many simulation studies. We also present a real-data analysis example and discuss how the framework used to develop the new procedures may be generalized to further enhance performance. An R code library that implements the methods is freely available at www.stjuderesearch.org/depts/biostats/software.
序列计数表达数据的差异表达分析涉及进行大量的假设检验,这些检验用于比较两个或更多生物条件下每个基因或转录本的表达计数数据。任何特定假设检验方法的假设可能对大量基因中的每一个都无效。因此,应将假设的计算评估纳入分析,以便为每个基因选择合适的假设检验方法。在此,我们推广早期工作,引入两种新方法,它们使用过度分散的经验贝叶斯概率(EBP)估计值来为每个基因选择或合并标准泊松似然比检验和拟似然检验的结果。这些基于EBP的方法同时评估泊松分布假设并考虑多重检验。凭借足够的能力检测过度分散,新方法为每个具有泊松分布计数的基因选择标准似然检验,为每个具有过度分散计数的基因选择拟似然检验。在许多模拟研究中,新方法的表现优于先前发表的方法。我们还给出了一个真实数据分析示例,并讨论了用于开发新方法的框架如何推广以进一步提高性能。实现这些方法的R代码库可在www.stjuderesearch.org/depts/biostats/software上免费获取。