Berger Swetlana, Pérez-Rodríguez Paulino, Veturi Yogasudha, Simianer Henner, de los Campos Gustavo
Animal Breeding and Genetics Group, Department of Animal Sciences, Georg-August-University Goettingen, Albrecht-Thaer-Weg 3, Goettingen, Germany.
Ann Hum Genet. 2015 Mar;79(2):122-35. doi: 10.1111/ahg.12099. Epub 2015 Jan 20.
Genome-wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS-significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole-genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker-quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker-QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker-QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker-QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height.
全基因组关联研究(GWAS)已经检测到大量与复杂人类性状和疾病相关的变异。然而,GWAS显著的单核苷酸多态性所解释的方差比例通常较小。这引发了人们对使用全基因组回归(WGR)方法的兴趣。然而,当将WGR应用于远亲个体的人类数据时,关于影响其预测准确性(PA)的因素的研究却很有限。在这里,我们使用真实的人类基因型和模拟表型,研究性状复杂性、标记-数量性状位点(QTL)连锁不平衡(LD)以及所使用的模型如何影响WGR的性能。我们的结果表明,估计的缺失遗传率取决于标记-QTL LD的程度。然而,该参数受性状复杂性的影响不大。关于PA,我们的结果表明:(a)在完美的标记-QTL LD下,WGR可以实现适度较高的预测准确性,并且在简单的遗传结构下,变量选择方法优于收缩程序;(b)在不完美的标记-QTL LD下,变量选择方法在简单或中等复杂的遗传结构下可以实现相当好的PA;然而,随着性状复杂性增加,这些方法的PA会恶化,并且对于高度复杂的性状,变量选择和收缩方法的表现都很差。对人类身高的分析证实了这一点。