Gerard David, Stephens Matthew
Department of Mathematics and Statistics, American University, Washington, DC 20016, USA.
Departments of Human Genetics and Statistics, University of Chicago, Chicago, IL 60637, USA.
Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.
Unwanted variation, including hidden confounding, is a well-known problem in many fields, but particularly in large-scale gene expression studies. Recent proposals to use control genes, genes assumed to be unassociated with the covariates of interest, have led to new methods to deal with this problem. Several versions of these removing unwanted variation (RUV) methods have been proposed, including RUV1, RUV2, RUV4, RUVinv, RUVrinv, and RUVfun. Here, we introduce a general framework, RUV*, that both unites and generalizes these approaches. This unifying framework helps clarify the connections between existing methods. In particular, we provide conditions under which RUV2 and RUV4 are equivalent. The RUV* framework preserves an advantage of the RUV approaches, namely, their modularity, which facilitates the development of novel methods based on existing matrix imputation algorithms. We illustrate this by implementing RUVB, a version of RUV* based on Bayesian factor analysis. In realistic simulations based on real data, we found RUVB to be competitive with existing methods in terms of both power and calibration. However, providing a consistently reliable calibration among the data sets remains challenging.
不必要的变异,包括隐藏的混杂因素,在许多领域都是一个众所周知的问题,尤其是在大规模基因表达研究中。最近提出的使用对照基因(即假定与感兴趣的协变量不相关的基因)的建议,催生了处理这一问题的新方法。已经提出了这些去除不必要变异(RUV)方法的几个版本,包括RUV1、RUV2、RUV4、RUVinv、RUVrinv和RUVfun。在此,我们引入了一个通用框架RUV*,它统一并概括了这些方法。这个统一框架有助于阐明现有方法之间的联系。特别是,我们给出了RUV2和RUV4等效的条件。RUV框架保留了RUV方法的一个优点,即其模块化,这有利于基于现有矩阵插补算法开发新方法。我们通过实现RUVB(一种基于贝叶斯因子分析的RUV版本)来说明这一点。在基于真实数据的实际模拟中,我们发现RUVB在功效和校准方面与现有方法具有竞争力。然而,在各数据集中提供始终可靠的校准仍然具有挑战性。