Suppr超能文献

使用系谱模拟对古代基因组的亲缘关系估计工具进行基准测试。

Benchmarking kinship estimation tools for ancient genomes using pedigree simulations.

机构信息

Department of Biological Sciences, Middle East Technical University, Ankara, Turkey.

Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.

出版信息

Mol Ecol Resour. 2024 Jul;24(5):e13960. doi: 10.1111/1755-0998.13960. Epub 2024 Apr 27.

Abstract

There is growing interest in uncovering genetic kinship patterns in past societies using low-coverage palaeogenomes. Here, we benchmark four tools for kinship estimation with such data: lcMLkin, NgsRelate, KIN, and READ, which differ in their input, IBD estimation methods, and statistical approaches. We used pedigree and ancient genome sequence simulations to evaluate these tools when only a limited number (1 to 50 K, with minor allele frequency ≥0.01) of shared SNPs are available. The performance of all four tools was comparable using ≥20 K SNPs. We found that first-degree related pairs can be accurately classified even with 1 K SNPs, with 85% F scores using READ and 96% using NgsRelate or lcMLkin. Distinguishing third-degree relatives from unrelated pairs or second-degree relatives was also possible with high accuracy (F > 90%) with 5 K SNPs using NgsRelate and lcMLkin, while READ and KIN showed lower success (69 and 79% respectively). Meanwhile, noise in population allele frequencies and inbreeding (first-cousin mating) led to deviations in kinship coefficients, with different sensitivities across tools. We conclude that using multiple tools in parallel might be an effective approach to achieve robust estimates on ultra-low-coverage genomes.

摘要

人们越来越感兴趣的是利用低覆盖度古基因组来揭示过去社会中的遗传亲缘关系模式。在这里,我们使用基于家系和古代基因组序列的模拟数据,对四种用于此类数据亲缘关系估计的工具(lcMLkin、NgsRelate、KIN 和 READ)进行了基准测试,它们在输入、IBD 估计方法和统计方法方面存在差异。当只有有限数量(1 到 50 K,最小等位基因频率≥0.01)的共享 SNPs 可用时,我们评估了这些工具的性能。当使用≥20 K SNPs 时,这四种工具的性能相当。我们发现,即使只有 1 K SNPs,也可以准确地对一级亲属进行分类,使用 READ 可达到 85%的 F 分数,使用 NgsRelate 或 lcMLkin 可达到 96%。使用 NgsRelate 和 lcMLkin,即使使用 5 K SNPs,也可以高精度(F>90%)区分三级亲属与无亲缘关系的个体或二级亲属,而 READ 和 KIN 的成功率较低(分别为 69%和 79%)。同时,群体等位基因频率和近交(表亲交配)的噪声会导致亲缘关系系数的偏差,不同工具的敏感性也不同。我们得出结论,并行使用多个工具可能是在超低覆盖度基因组上实现稳健估计的有效方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验