Wei Yuan, Zhi Degui, Zhang Shaojie
Department of Computer Science, University of Central Florida, Orlando, FL, USA.
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
bioRxiv. 2024 Sep 25:2023.11.17.567650. doi: 10.1101/2023.11.17.567650.
The availability of large genotyped cohorts brings new opportunities for revealing the high-resolution genetic structure of admixed populations via local ancestry inference (LAI), the process of identifying the ancestry of each segment of an individual haplotype. Though current methods achieve high accuracy in standard cases, LAI is still challenging when reference populations are more similar (e.g., intra-continental), when the number of reference populations is too numerous, or when the admixture events are deep in time, all of which are increasingly unavoidable in large biobanks. Here, we present a new LAI method, Recomb-Mix. Recomb-Mix integrates the elements of existing methods of the site-based Li and Stephens model and introduces a new graph collapsing trick to simplify counting paths with the same ancestry label readout. Through comprehensive benchmarking on various simulated datasets, we show that Recomb-Mix is more accurate than existing methods in diverse sets of scenarios while being competitive in terms of resource efficiency. We expect that Recomb-Mix will be a useful method for advancing genetics studies of admixed populations.
大型基因分型队列的出现为通过局部祖先推断(LAI)揭示混合群体的高分辨率遗传结构带来了新机遇,局部祖先推断是指识别个体单倍型各片段祖先的过程。尽管当前方法在标准情况下具有较高的准确性,但当参考群体更为相似(例如大陆内部)、参考群体数量过多或混合事件发生时间久远时,局部祖先推断仍然具有挑战性,而在大型生物样本库中,这些情况越来越难以避免。在此,我们提出了一种新的局部祖先推断方法Recomb-Mix。Recomb-Mix整合了基于位点的李和斯蒂芬斯模型的现有方法的要素,并引入了一种新的图折叠技巧,以简化具有相同祖先标签读数的计数路径。通过对各种模拟数据集的全面基准测试,我们表明,在不同的场景中,Recomb-Mix比现有方法更准确,同时在资源效率方面具有竞争力。我们预计,Recomb-Mix将成为推进混合群体遗传学研究的一种有用方法。