School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
BMC Biol. 2021 Feb 16;19(1):32. doi: 10.1186/s12915-021-00964-y.
The genealogical histories of individuals within populations are of interest to studies aiming both to uncover detailed pedigree information and overall quantitative population demographic histories. However, the analysis of quantitative details of individual genealogical histories has faced challenges from incomplete available pedigree records and an absence of objective and quantitative details in pedigree information. Although complete pedigree information for most individuals is difficult to track beyond a few generations, it is possible to describe a person's genealogical history using their genetic relatives revealed by identity by descent (IBD) segments-long genomic segments shared by two individuals within a population, which are identical due to inheritance from common ancestors. When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using such IBD segments, offering opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBD segments among the UK Biobank participants that represent 0.7% of the UK population.
We made a high-quality call set of IBD segments over 5 cM among all 500,000 UK Biobank participants. On average, one UK individual shares IBD segments with 14,000 UK Biobank participants, which we refer to as "relatives." Using these segments, approximately 80% of a person's genome can be imputed. We subsequently propose genealogical descriptors based on the genetic connections of relative cohorts of individuals sharing at least one IBD segment and show that such descriptors offer important information about one's genetic makeup, personal genealogical history, and social behavior. Through analysis of relative counts sharing segments at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of relatives in one's neighborhood, we identified regional variations of personal preference favoring living closer to one's extended families.
Our analysis revealed genetic makeup, personal genealogical history, and social behaviors at the population scale, opening possibilities for further studies of individual's genetic connections in biobank data.
个体的系谱历史对于旨在揭示详细谱系信息和整体定量人口历史的研究很有意义。然而,分析个体系谱历史的定量细节面临着来自不完全的可用谱系记录以及谱系信息中缺乏客观和定量细节的挑战。尽管大多数个体的完整谱系信息很难追溯到几代以上,但可以使用通过血缘关系(IBD)片段揭示的遗传亲属来描述一个人的系谱历史——人群中两个个体之间共享的长基因组片段,由于从共同祖先继承而相同。当现代生物库为人群的很大一部分收集基因型信息时,可以使用这些 IBD 片段追踪一个人的密集遗传联系,为在潜在人群背景下描述个体提供机会。在这里,我们对代表英国人口 0.7%的英国生物库参与者中的 IBD 片段进行了以个体为中心的分析。
我们对所有 50 万英国生物库参与者进行了 5cM 以上的 IBD 片段高质量调用集。平均而言,一个英国个体与 14000 个英国生物库参与者共享 IBD 片段,我们称之为“亲属”。使用这些片段,大约 80%的基因组可以进行推断。随后,我们基于至少共享一个 IBD 片段的亲属群体的遗传联系提出了系谱描述符,并表明这些描述符提供了有关一个人遗传构成、个人系谱历史和社会行为的重要信息。通过分析在不同长度上共享片段的相对数量,我们确定了一个群体,可能是英国犹太人,他们的家族扩张历史模式独特。最后,利用一个人邻居中亲属的富集情况,我们确定了个人偏好更倾向于居住在其大家庭附近的区域差异。
我们的分析揭示了群体层面的遗传构成、个人系谱历史和社会行为,为进一步研究生物库数据中个体的遗传联系提供了可能性。