Department of Computational Biology, University of Lausanne, Quartier Sorge, 1015 Lausanne, Switzerland.
Syst Biol. 2021 Jun 16;70(4):844-854. doi: 10.1093/sysbio/syaa081.
Next-generation-sequencing genotype callers are commonly used in studies to call variants from newly sequenced species. However, due to the current availability of genomic resources, it is still common practice to use only one reference genome for a given genus, or even one reference for an entire clade of a higher taxon. The problem with traditional genotype callers, such as the one from GATK, is that they are optimized for variant calling at the population level. However, when these callers are used at the phylogenetic level, the consequences for downstream analyses can be substantial. Here, we performed simulations to compare the performance between the genotype callers of GATK and ATLAS, and present their differences at various phylogenetic scales. We show that the genotype caller of GATK substantially underestimates the number of variants at the phylogenetic level, but not at the population level. We also found that the accuracy of heterozygote calls declines with increasing distance to the reference genome. We quantified this decline and found that it is very sharp in GATK, while ATLAS maintains high accuracy even at moderately divergent species from the reference. We further suggest that efforts should be taken towards acquiring more reference genomes per species, before pursuing high-scale phylogenomic studies. [ATLAS; efficiency of SNP calling; GATK; heterozygote calling; next-generation sequencing; reference genome; variant calling.].
下一代测序基因型调用程序通常用于对新测序物种的变体进行调用。然而,由于当前基因组资源的可用性,在给定属中仅使用一个参考基因组,甚至在整个高级分类群的一个参考基因组的情况仍然很常见。传统基因型调用程序(例如 GATK 中的调用程序)的问题在于,它们针对群体水平的变体调用进行了优化。然而,当在系统发育水平上使用这些调用程序时,对下游分析的影响可能是巨大的。在这里,我们进行了模拟比较,以比较 GATK 和 ATLAS 基因型调用程序的性能,并在各种系统发育尺度上展示它们的差异。我们表明,GATK 的基因型调用程序在系统发育水平上大大低估了变体的数量,但在群体水平上则没有。我们还发现,杂合子调用的准确性随着与参考基因组距离的增加而下降。我们量化了这种下降,并发现 GATK 中的下降非常明显,而 ATLAS 即使在与参考基因组中等分歧的物种中也保持着高准确性。我们进一步建议,在进行大规模系统发育基因组学研究之前,应努力为每个物种获取更多的参考基因组。 [ATLAS;SNP 调用效率;GATK;杂合子调用;下一代测序;参考基因组;变体调用。]