Palowitch John, Shabalin Andrey, Zhou Yi-Hui, Nobel Andrew B, Wright Fred A
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A.
Department of Psychiatry, University of Utah, Salt Lake City, Utah 84108, U.S.A.
Biometrics. 2018 Jun;74(2):616-625. doi: 10.1111/biom.12810. Epub 2017 Oct 26.
The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.
表达数量性状基因座(eQTL)的研究是基因组学和生物医学中的一个重要问题。虽然eQTL关联的检测(测试)已得到广泛研究,但对eQTL效应大小的估计工作较少。为了减少假阳性,检测方法通常依赖于基于秩的归一化或对数变换基因表达数据的线性建模。不幸的是,这些方法并不对应于eQTL作用的最简单模型,因此产生的eQTL关联估计可能无法解释且不准确。在本文中,我们提出了一种新的eQTL作用的线性对数模型,称为ACME,它以加性方式捕获等位基因对顺式作用eQTL的贡献,产生的效应大小估计对应于顺式eQTL的生物学连贯模型。我们描述了一种非线性最小二乘算法,通过最大似然法拟合模型,并获得相应的p值。我们结合模拟数据和基因型组织表达(GTEx)项目的数据对该模型进行了仔细研究。我们的结果几乎没有显示出显性效应的证据,这一简约结果符合等位基因特异性表达的简单生物学模型,并支持使用ACME模型。我们表明,在现实环境中,我们的方法能很好地控制I型错误,因此基于秩的归一化是不必要的。此外,我们表明,在所提出的模型下,这种归一化可能会损害检验效能和估计准确性。然后我们通过对GTEx数据中全基因组顺式eQTL的效应大小分析表明,使用标准归一化而不是ACME会显著影响估计的排名和符号。