Williams Matthew P, Flegontov Pavel, Maier Robert, Huber Christian D
Pennsylvania State University, Department of Biology, University Park, PA 16802, USA.
Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia.
bioRxiv. 2023 Nov 15:2023.11.13.566841. doi: 10.1101/2023.11.13.566841.
Paleogenomics has expanded our knowledge of human evolutionary history. Since the 2020s, the study of ancient DNA has increased its focus on reconstructing the recent past. However, the accuracy of paleogenomic methods in answering questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation within the historical period remains an open question. We used two simulation approaches to evaluate the limitations and behavior of commonly used methods, qpAdm and the -statistic, on admixture inference. The first is based on branch-length data simulated from four simple demographic models of varying complexities and configurations. The second, an analysis of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudo-haploidization. We show that under conditions resembling historical populations, qpAdm can identify a small candidate set of true sources and populations closely related to them. However, in typical ancient DNA conditions, qpAdm is unable to further distinguish between them, limiting its utility for resolving fine-scaled hypotheses. Notably, we find that complex gene-flow histories generally lead to improvements in the performance of qpAdm and observe no bias in the estimation of admixture weights. We offer a heuristic for admixture inference that incorporates admixture weight estimate and -values of qpAdm models, and -statistics to enhance the power to distinguish between multiple plausible candidates. Finally, we highlight the future potential of qpAdm through whole-genome branch-length -statistics, demonstrating the improved demographic inference that could be achieved with advancements in -statistic estimations.
古基因组学拓展了我们对人类进化历史的认识。自20世纪20年代以来,古代DNA研究越来越聚焦于重建近代历史。然而,在历史时期人口复杂性增加和遗传分化减少的情况下,古基因组学方法在回答具有历史和考古重要性的问题时的准确性仍是一个悬而未决的问题。我们使用两种模拟方法来评估常用方法qpAdm和F统计量在混合推断方面的局限性和表现。第一种基于从四个复杂度和结构各异的简单人口模型模拟出的分支长度数据。第二种是对由59个群体组成的欧亚历史进行分析,使用经过古代DNA条件(如单核苷酸多态性确定、数据缺失和假单倍体化)修改的全基因组数据。我们表明,在类似于历史群体的条件下,qpAdm可以识别一小组真实来源及其密切相关的群体的候选集。然而,在典型的古代DNA条件下,qpAdm无法进一步区分它们,限制了其在解决精细尺度假设方面的效用。值得注意的是,我们发现复杂的基因流历史通常会导致qpAdm性能的提升,并且在混合权重估计中未观察到偏差。我们提供了一种用于混合推断的启发式方法,该方法结合了混合权重估计和qpAdm模型的p值以及F统计量,以增强区分多个合理候选者的能力。最后,我们通过全基因组分支长度F统计量突出了qpAdm未来的潜力,展示了随着F统计量估计的进步可以实现的改进的人口推断。