Zhou Xiang
University of Michigan.
Ann Appl Stat. 2017 Dec;11(4):2027-2051. doi: 10.1214/17-AOAS1052. Epub 2017 Dec 28.
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
线性混合模型(LMMs)是基因关联研究中最常用的工具之一。然而,LMMs中估计方差分量的标准方法——限制最大似然估计法(REML)——存在几个重要缺点:REML需要研究中所有样本的个体水平基因型和表型,计算速度慢,并且在病例对照研究中会产生向下偏倚的估计值。为了弥补这些缺点,我们提出了一种用于方差分量估计的替代框架,我们称之为MQS。MQS基于矩估计法(MoM)和最小范数二次无偏估计(MINQUE)准则,并将两种看似不相关的方法——著名的哈斯曼-埃尔斯顿(HE)回归和最近的连锁不平衡分数回归(LDSC)——纳入同一个统一的统计框架。有了这个新框架,我们提供了一种替代但数学上等效的HE形式,允许使用汇总统计量。我们提供了LDSC的精确估计形式,以产生无偏且统计效率更高的估计值。我们方法的一个关键特征是,它能够将使用所有样本计算的边际分数与使用一小部分随机个体(或来自适当参考面板的个体)计算的SNP相关信息进行配对,同时能够产生几乎与使用完整数据计算这两个量时一样准确的估计值。因此,我们的方法产生无偏且统计有效的估计值,并利用汇总统计量,同时对于大数据集计算效率高。通过对8个真实数据集的37种表型进行模拟和应用,我们展示了我们的方法在人群研究中估计和划分SNP遗传力以及在家庭研究中进行遗传力估计的优势。我们的方法在GEMMA软件包中实现,可在www.xzlab.org/software.html上免费获取。