Schweiger Regev, Kaufman Shachar, Laaksonen Reijo, Kleber Marcus E, März Winfried, Eskin Eleazar, Rosset Saharon, Halperin Eran
Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel.
Department of Statistics, Tel Aviv University, Tel Aviv 6997801, Israel.
Am J Hum Genet. 2016 Jun 2;98(6):1181-1192. doi: 10.1016/j.ajhg.2016.04.016.
Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and estimators of SEs for REML rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. Here, we show that the estimation of confidence intervals by state-of-the-art methods is inaccurate, especially when the true heritability is relatively low or relatively high. We further show that these inaccuracies occur in datasets including thousands of individuals. Such biases are present, for example, in estimates of heritability of gene expression in the Genotype-Tissue Expression project and of lipid profiles in the Ludwigshafen Risk and Cardiovascular Health study. We also show that often the probability that the genetic component is estimated as 0 is high even when the true heritability is bounded away from 0, emphasizing the need for accurate confidence intervals. We propose a computationally efficient method, ALBI (accurate LMM-based heritability bootstrap confidence intervals), for estimating the distribution of the heritability estimator and for constructing accurate confidence intervals. Our method can be used as an add-on to existing methods for estimating heritability and variance components, such as GCTA, FaST-LMM, GEMMA, or EMMAX.
遗传力估计是遗传学研究的基础。最近,使用线性混合模型(LMMs)进行遗传力估计变得很流行,因为这些估计可以从全基因组关联研究中收集的无关个体中获得。通常,LMMs下的遗传力估计使用限制最大似然(REML)方法。现有的构建REML的置信区间和标准误估计器的方法依赖于渐近性质。然而,由于参数空间有界、统计相关性和样本量有限,这些假设常常被违反,导致估计有偏差,置信区间膨胀或缩小。在这里,我们表明,使用现有方法估计置信区间是不准确的,特别是当真实遗传力相对较低或相对较高时。我们进一步表明,这些不准确情况出现在包含数千个个体的数据集里。例如,在基因型-组织表达项目中基因表达的遗传力估计以及路德维希港风险与心血管健康研究中血脂谱的遗传力估计中就存在这种偏差。我们还表明,即使真实遗传力远离0,遗传成分被估计为0的概率也常常很高,这强调了准确置信区间的必要性。我们提出了一种计算效率高的方法,即ALBI(基于准确线性混合模型的遗传力自助置信区间),用于估计遗传力估计器的分布并构建准确的置信区间。我们的方法可以作为现有遗传力和方差成分估计方法(如GCTA、FaST-LMM、GEMMA或EMMAX)的补充。