The Research School of Biology, The Australian National University, ACT, Australia.
School of Biological Sciences, University of Auckland, Auckland, New Zealand.
PLoS Comput Biol. 2021 Sep 13;17(9):e1008949. doi: 10.1371/journal.pcbi.1008949. eCollection 2021 Sep.
A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.
一种从多个个体中获取单倍型信息的当前策略涉及混合扩增子的短读测序,其中每个个体的片段通过独特的 DNA 条码来识别。在本文中,我们报告了一种从混合个体的混合扩增子获得的短读序列中恢复单倍型系统发育的新方法,无需条码。该方法 AFPhyloMix 接受混合物的读取与参考序列的比对,获得沿比对的单核苷酸多态性(SNP)模式,并根据 SNP 模式构建系统发育树。AFPhyloMix 采用贝叶斯推断模型来估计单倍型及其相对丰度的系统发育,假设单倍型的数量是已知的。在我们的模拟中,AFPhyloMix 在恢复组成单倍型的系统发育和相对丰度方面至少达到了 80%的准确性,对于包含多达 15 个单倍型的混合物也是如此。AFPhyloMix 在袋鼠线粒体 DNA 序列的真实数据集上也表现良好。