Erven J A M, Çakirlar C, Bradley D G, Raemaekers D C M, Madsen O
Groningen Institute of Archaeology, University of Groningen, Groningen, Netherlands.
Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland.
Front Genet. 2022 Jul 12;13:872486. doi: 10.3389/fgene.2022.872486. eCollection 2022.
Sequencing ancient DNA to high coverage is often limited by sample quality and cost. Imputing missing genotypes can potentially increase information content and quality of ancient data, but requires different computational approaches than modern DNA imputation. Ancient imputation beyond humans has not been investigated. In this study we report results of a systematic evaluation of imputation of three whole genome ancient samples from the Early and Late Neolithic (∼7,100-4,500 BP), to test the utility of imputation. We show how issues like genetic architecture and, reference panel divergence, composition and size affect imputation accuracy. We evaluate a variety of imputation methods, including Beagle5, GLIMPSE, and Impute5 with varying filters, pipelines, and variant calling methods. We achieved genotype concordance in most cases reaching above 90%; with the highest being 98% with ∼2,000,000 variants recovered using GLIMPSE. Despite this high concordance the sources of diversity present in the genotypes called in the original high coverage genomes were not equally imputed leading to biases in downstream analyses; a trend toward genotypes most common in the reference panel is observed. This demonstrates that the current reference panel does not possess the full diversity needed for accurate imputation of ancient , due to missing variations from Near Eastern and Mesolithic wild boar. Imputation of ancient holds potential but should be approached with caution due to these biases, and suggests that there is no universal approach for imputation of non-human ancient species.
将古代DNA测序到高覆盖率通常受到样本质量和成本的限制。估算缺失的基因型可能会增加古代数据的信息含量和质量,但需要与现代DNA估算不同的计算方法。除人类外的古代估算尚未得到研究。在本研究中,我们报告了对来自新石器时代早期和晚期(约7100 - 4500年前)的三个全基因组古代样本进行估算的系统评估结果,以测试估算的效用。我们展示了遗传结构、参考面板差异、组成和大小等问题如何影响估算准确性。我们评估了多种估算方法,包括使用不同过滤器、流程和变异检测方法的Beagle5、GLIMPSE和Impute5。在大多数情况下,我们实现了基因型一致性达到90%以上;使用GLIMPSE恢复约200万个变异时,最高达到98%。尽管一致性很高,但原始高覆盖率基因组中调用的基因型中存在的多样性来源并未被同等估算,导致下游分析出现偏差;观察到一种向参考面板中最常见基因型的趋势。这表明由于近东和中石器时代野猪的变异缺失,当前的参考面板不具备准确估算古代样本所需的全部多样性。古代样本的估算具有潜力,但由于这些偏差,应谨慎对待,并表明对于非人类古代物种的估算没有通用方法。