Department of Biostatistics, University of Alabama at Birmingham, Alabama, United States of America.
PLoS Genet. 2011 Apr;7(4):e1002051. doi: 10.1371/journal.pgen.1002051. Epub 2011 Apr 28.
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.
尽管基因组技术发展迅速,但我们利用遗传信息解释表型变异的能力对于许多特征仍然有限。这导致遗传数据在预防和个性化医学方面的应用受到限制,而这正是全基因组关联研究的主要推动力之一。最近,通过同时对数千个单核苷酸多态性进行建模,很大一部分人类身高的“遗传缺失”可以从统计学上得到解释。然而,目前尚不清楚遗传方差的增加将如何转化为对尚未观察到的表型的预测。我们使用弗雷明汉心脏研究的数据,在训练和验证样本中探索人类身高的基因组预测,同时改变所使用的统计方法、纳入模型的 SNP 数量、验证方案以及用于训练模型的样本数量。在我们的训练数据集中,我们能够解释身高变化的很大一部分(h(2)高达 0.83,R(2)高达 0.96)。然而,验证样本中解释的方差比例要小得多(根据训练数据集使用的家族信息程度,范围从 0.15 到 0.36)。虽然这些 R(2)值远远超过了以前使用较少预选标记(<0.10)所报告的值,但考虑到该特征的遗传力(~0.80),仍有很大的改进空间。