Rossell David, Guerra Rudy, Scott Clayton
Institute for Research in Biomedicine of Barcelona.
Stat Appl Genet Mol Biol. 2008;7(1):Article15. doi: 10.2202/1544-6115.1333. Epub 2008 Apr 28.
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid. We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott's minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closed-form estimates for the proportion of EE genes. Pseudo-Bayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial advantages for small sample sizes over the SAM method of Tusher et al. [2001], the empirical Bayes procedure of Efron and Tibshirani [2002], the mixture of normals of Pan et al. [2003] and a t-test with p-value adjustment [Dudoit et al., 2003] to control the FDR [Benjamini and Hochberg, 1995].
我们开发了一种用于微阵列差异表达分析的方法,即识别在两个或更多组之间表达水平存在差异的基因。当前的推断方法要么依赖于完全参数假设,要么依赖于基于排列的技术在零分布下进行抽样。然而,在某些情况下,完全参数模型可能不合理,或者每组的样本量太小以至于排列方法无效。我们提出了一种基于部分混合估计的半参数框架,该框架仅对零(等表达)分布需要参数假设,并且可以处理排列方法失效的小样本量情况。我们对用于部分混合估计的斯科特最小积分平方误差准则 [斯科特,2004a,b] 进行了两项新颖的改进。作为一个附带好处,我们获得了可解释的封闭形式的估计值,用于估计等表达基因的比例。给出了用于控制错误发现率的伪贝叶斯和频率主义程序。模拟和真实数据集的结果表明,对于小样本量,我们的方法相对于图舍尔等人 [2001] 的SAM方法、埃弗龙和蒂布希拉尼 [2002] 的经验贝叶斯程序、潘等人 [2003] 的正态混合以及具有p值调整的t检验 [杜多伊特等人,2003] 来控制错误发现率 [本雅明尼和霍赫贝格,1995] 具有显著优势。