Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
BMC Bioinformatics. 2009 Dec 8;10:402. doi: 10.1186/1471-2105-10-402.
Many researchers use the double filtering procedure with fold change and t test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods.
This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while t statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure.
We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure.
许多研究人员使用 fold change 和 t 检验的双重过滤程序来识别差异表达基因,希望双重过滤能够为结果提供额外的信心。尽管出现了更复杂的方法,但由于其简单性,该双重过滤程序一直受到应用研究人员的欢迎。
本文首次从理论上揭示了双重过滤程序的缺陷。我们表明,fold change 假设所有基因具有共同的方差,而 t 统计量假设基因特异性方差。这两个统计量基于相互矛盾的假设。在基因方差由共同方差和基因特异性方差的混合物产生的假设下,我们开发了理论上最强大的似然比检验统计量。我们进一步证明,基于贝叶斯混合模型的后验推断和广泛使用的基因芯片显著性分析 (SAM) 统计量比双重过滤程序更接近似然比检验。
我们通过假设检验理论、模拟研究和真实数据示例证明,构建良好的收缩检验方法可以在混合基因方差假设下统一,并且可以大大优于双重过滤程序。