Yang Yaoling, Durbin Richard, Iversen Astrid K N, Lawson Daniel J
Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK.
MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK.
Nat Commun. 2025 Mar 20;16(1):2742. doi: 10.1038/s41467-025-57601-3.
Increasingly efficient methods for inferring the ancestral origin of genome regions are needed to gain insights into genetic function and history as biobanks grow in scale. Here we describe two near-linear time algorithms to learn ancestry harnessing the strengths of a Positional Burrows-Wheeler Transform. SparsePainter is a faster, sparse replacement of previous model-based 'chromosome painting' algorithms to identify recently shared haplotypes, whilst PBWTpaint uses further approximations to obtain lightning-fast estimation optimized for genome-wide relatedness estimation. The computational efficiency gains of these tools for fine-scale local ancestry inference offer the possibility to analyse large-scale genomic datasets using different approaches. Application to the UK Biobank shows that haplotypes better represent ancestries than principal components, whilst linkage-disequilibrium of ancestry identifies signals of recent changes to population-specific selection for many genomic regions associated with immune responses, suggesting avenues for understanding the pathogen-immune system interplay on a historical timescale.
随着生物样本库规模的扩大,需要越来越高效的方法来推断基因组区域的祖先起源,以便深入了解基因功能和历史。在这里,我们描述了两种近线性时间算法,利用位置Burrows-Wheeler变换的优势来学习祖先信息。SparsePainter是一种更快的、基于稀疏模型的算法,用于替代先前基于模型的“染色体绘画”算法,以识别最近共享的单倍型,而PBWTpaint则使用进一步的近似方法,以获得针对全基因组相关性估计进行优化的闪电般快速估计。这些工具在精细尺度局部祖先推断方面的计算效率提升,为使用不同方法分析大规模基因组数据集提供了可能性。应用于英国生物样本库的结果表明,单倍型比主成分更能代表祖先信息,而祖先的连锁不平衡则识别出许多与免疫反应相关的基因组区域中,特定人群选择近期变化的信号,这为在历史时间尺度上理解病原体与免疫系统的相互作用提供了途径。