Deng Yun, Nielsen Rasmus, Song Yun S
Center for Computational Biology, University of California, Berkeley, CA, USA.
Department of Statistics, University of California, Berkeley, CA, USA.
Nat Genet. 2025 Sep 8. doi: 10.1038/s41588-025-02317-9.
The Ancestral Recombination Graph (ARG), which describes the genealogical history of a sample of genomes, is a vital tool in population genomics and biomedical research. Recent advancements have substantially increased ARG reconstruction scalability, but they rely on approximations that can reduce accuracy, especially under model misspecification. Moreover, they reconstruct only a single ARG topology and cannot quantify the considerable uncertainty associated with ARG inferences. Here, to address these challenges, we introduce SINGER (sampling and inferring of genealogies with recombination), a method that accelerates ARG sampling from the posterior distribution by two orders of magnitude, enabling accurate inference and uncertainty quantification for hundreds of whole-genome sequences. Through extensive simulations, we demonstrate SINGER's enhanced accuracy and robustness to model misspecification compared to existing methods. We demonstrate the utility of SINGER by applying it to individuals of British and African descent within the 1000 Genomes Project, identifying signals of population differentiation, archaic introgression and strong support for ancient polymorphism in the human leukocyte antigen region shared across primates.
祖先重组图(ARG)描述了一组基因组样本的谱系历史,是群体基因组学和生物医学研究中的重要工具。最近的进展显著提高了ARG重建的可扩展性,但它们依赖于近似方法,这可能会降低准确性,尤其是在模型设定错误的情况下。此外,它们只重建单一的ARG拓扑结构,无法量化与ARG推断相关的巨大不确定性。在这里,为应对这些挑战,我们引入了SINGER(带重组的谱系抽样与推断),这是一种将从后验分布中进行ARG抽样的速度提高两个数量级的方法,能够对数百个全基因组序列进行准确推断和不确定性量化。通过广泛的模拟,我们证明了与现有方法相比,SINGER在模型设定错误情况下具有更高的准确性和稳健性。我们将SINGER应用于千人基因组计划中英国和非洲裔个体,展示了它的实用性,识别出了群体分化信号、古老基因渗入以及对灵长类动物共有的人类白细胞抗原区域中古代多态性的有力支持。