National Centre for Register-based Research, Aarhus University, Aarhus 8210, Denmark.
Bioinformatics. 2022 Jun 27;38(13):3477-3480. doi: 10.1093/bioinformatics/btac348.
Measuring genetic diversity is an important problem because increasing genetic diversity is a key to making new genetic discoveries, while also being a major source of confounding to be aware of in genetics studies.
Using the UK Biobank data, a prospective cohort study with deep genetic and phenotypic data collected on almost 500 000 individuals from across the UK, we carefully define 21 distinct ancestry groups from all four corners of the world. These ancestry groups can serve as a global reference of worldwide populations, with a handful of applications. Here, we develop a method that uses allele frequencies and principal components derived from these ancestry groups to effectively measure ancestry proportions from allele frequencies of any genetic dataset.
This method is implemented in function snp_ancestry_summary of R package bigsnpr.
Supplementary data are available at Bioinformatics online.
衡量遗传多样性是一个重要的问题,因为增加遗传多样性是做出新的遗传发现的关键,同时也是遗传学研究中需要注意的主要混杂来源。
利用英国生物库(UK Biobank)的数据,这是一项前瞻性队列研究,对来自英国各地的近 50 万人进行了深入的遗传和表型数据收集,我们从世界的四个角落仔细定义了 21 个不同的祖先群体。这些祖先群体可以作为全球人口的全球参考,具有多种应用。在这里,我们开发了一种方法,该方法使用来自这些祖先群体的等位基因频率和主成分,从任何遗传数据集的等位基因频率中有效地测量祖先比例。
该方法在 R 包 bigsnpr 的函数 snp_ancestry_summary 中实现。
补充数据可在“Bioinformatics”在线获得。