Department of Statistics, University of California at Berkeley, Berkeley, CA 94720-3860, USA.
Biostatistics. 2012 Jul;13(3):539-52. doi: 10.1093/biostatistics/kxr034. Epub 2011 Nov 17.
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
微阵列表达研究受到批次效应和其他非期望变异的问题的困扰。已经提出了许多方法来调整微阵列数据以减轻非期望变异的问题。这些方法中的一些依赖于因子分析从数据中推断非期望的变异。这种方法的一个核心问题是难以从研究人员感兴趣的生物学变异中辨别非期望的变异。我们提出了一种新的方法,旨在用于差异表达研究,通过将因子分析限制在负对照基因上来尝试克服这个问题。负对照基因是事先已知相对于感兴趣的生物学因子没有差异表达的基因。因此,可以假定这些基因表达水平的变化是非期望的变异。我们将这种方法命名为“去除非期望变异,2 步”(RUV-2)。我们讨论了评估调整方法性能的各种技术,并将 RUV-2 的性能与其他常用的调整方法(如 Combat 和 Surrogate Variable Analysis(SVA))进行了比较。我们提出了几个示例研究,每个研究都涉及大脑中与性别差异表达的基因,发现 RUV-2 的性能与其他方法一样好或更好。最后,我们讨论了将 RUV-2 适用于不涉及差异表达的研究的可能性,并得出结论,虽然可能有希望,但仍存在重大挑战。