distAngsd:用于下一代测序数据的快速准确的遗传距离推断。
distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data.
机构信息
Section for Geogenetics, Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 København K, Denmark.
Department of Integrative Biology, University of California, 3040 Valley Life Sciences Building 3140, Berkeley, CA 94720-3140, USA.
出版信息
Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac119.
Commonly used methods for inferring phylogenies were designed before the emergence of high-throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.
常用的推断系统发育的方法是在高通量测序出现之前设计的,通常无法适应与嘈杂的、二倍体测序数据相关的挑战。在许多应用中,二倍体基因组仍然通过使用歧义字符被视为单倍体;而由于测序技术而产生的基因型调用的不确定性则被忽略。为了解决这个问题,我们描述了两种用于估计遗传距离的新的概率方法:distAngsd-geno 和 distAngsd-nuc,这两种方法都在一个名为 distAngsd 的软件套件中实现。这些方法是专门为下一代测序数据设计的,利用数据的全部信息,并考虑基因型调用的不确定性。通过广泛的模拟,我们表明,这些新方法比其他现有的用于估计遗传距离的方法更准确,并且具有更稳定的统计行为,即使是在错误率很高的深度很低的数据中也是如此。