Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095, USA; Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Am J Hum Genet. 2021 Apr 1;108(4):620-631. doi: 10.1016/j.ajhg.2021.02.013. Epub 2021 Mar 9.
Phenotype prediction is a key goal for medical genetics. Unfortunately, most genome-wide association studies are done in European populations, which reduces the accuracy of predictions via polygenic scores in non-European populations. Here, we use population genetic models to show that human demographic history and negative selection on complex traits can result in population-specific genetic architectures. For traits where alleles with the largest effect on the trait are under the strongest negative selection, approximately half of the heritability can be accounted for by variants in Europe that are absent from Africa, leading to poor performance in phenotype prediction across these populations. Further, under such a model, individuals in the tails of the genetic risk distribution may not be identified via polygenic scores generated in another population. We empirically test these predictions by building a model to stratify heritability between European-specific and shared variants and applied it to 37 traits and diseases in the UK Biobank. Across these phenotypes, ∼30% of the heritability comes from European-specific variants. We conclude that genetic association studies need to include more diverse populations to enable the utility of phenotype prediction in all populations.
表型预测是医学遗传学的一个关键目标。不幸的是,大多数全基因组关联研究都是在欧洲人群中进行的,这降低了通过多基因评分在非欧洲人群中进行预测的准确性。在这里,我们使用群体遗传模型表明,人类的人口历史和对复杂特征的负选择可能导致特定于群体的遗传结构。对于那些对特征影响最大的等位基因受到最强负选择的特征,大约一半的遗传力可以由欧洲的变体来解释,而这些变体在非洲是不存在的,这导致了这些人群中表型预测的表现不佳。此外,在这样的模型下,通过在另一个群体中生成的多基因评分,可能无法识别遗传风险分布尾部的个体。我们通过构建一个模型来区分欧洲特有的和共享的变体之间的遗传力,并将其应用于英国生物库中的 37 个特征和疾病,来验证这些预测。在这些表型中,约 30%的遗传力来自欧洲特有的变体。我们的结论是,遗传关联研究需要包括更多样化的人群,以实现所有人群中表型预测的实用性。