Centre for Mathematics and Applications (CMA) and Department of Mathematics, FCT - NOVA University of Lisbon, Lisbon, Portugal.
Department of Statistics, Federal University of Bahia, Bahia, Brazil.
Bioinformatics. 2017 Nov 15;33(22):3584-3594. doi: 10.1093/bioinformatics/btx457.
In genetic association studies, linear mixed models (LMMs) are used to test for associations between phenotypes and candidate single nucleotide polymorphisms (SNPs). These same models are also used to estimate heritability, which is central not only to evolutionary biology but also to the prediction of the response to selection in plant and animal breeding, as well as the prediction of disease risk in humans. However, when one or more of the underlying assumptions are violated, the estimation of variance components may be compromised and therefore so may the estimates of heritability and any other functions of these. Considering that datasets obtained from real life experiments are prone to several sources of contamination, which usually induce the violation of the assumption of the normality of the errors, a robust derivative-free restricted-maximum likelihood framework (DF-REML) together with a robust coefficient of determination are proposed for the LMM in the context of genetic studies of continuous traits.
The proposed approach, in addition to the robust estimation of variance components and robust computation of the coefficient of determination, allows in particular for the robust estimation of SNP-based heritability by reducing the bias and increasing the precision of its estimates. The performance of both classical and robust DF-REML approaches is compared via a Monte Carlo simulation study. Additionally, three examples of application of the methodologies to real datasets are given in order to validate the usefulness of the proposed robust approach. Although the main focus of this article is on plant breeding applications, the proposed methodology is applicable to both human and animal genetic studies.
Source code implemented in R is available in the Supplementary Material.
Supplementary data are available at Bioinformatics online.
在遗传关联研究中,线性混合模型(LMM)用于检验表型与候选单核苷酸多态性(SNP)之间的关联。这些相同的模型也用于估计遗传力,遗传力不仅对进化生物学很重要,而且对植物和动物育种中选择的反应预测以及人类疾病风险的预测也很重要。然而,当一个或多个基本假设被违反时,方差分量的估计可能会受到影响,因此遗传力的估计以及这些估计的任何其他功能也可能受到影响。考虑到从实际实验中获得的数据集中通常存在几种污染源,这通常会导致误差正态性假设的违反,因此提出了一种稳健的无导数限制极大似然框架(DF-REML)以及稳健的决定系数,用于遗传研究中连续性状的 LMM。
除了稳健的方差分量估计和稳健的决定系数计算外,所提出的方法还特别允许通过减少偏差和提高其估计的精度来稳健地估计 SNP 遗传力。通过蒙特卡罗模拟研究比较了经典和稳健的 DF-REML 方法的性能。此外,还给出了三个将方法应用于真实数据集的示例,以验证所提出的稳健方法的有用性。虽然本文的主要重点是植物育种应用,但所提出的方法也适用于人类和动物遗传研究。
用 R 实现的源代码可在补充材料中获得。
补充数据可在 Bioinformatics 在线获得。