Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
Bioinformatics. 2021 Dec 11;37(24):4677-4683. doi: 10.1093/bioinformatics/btab555.
BAli-Phy, a popular Bayesian method that co-estimates multiple sequence alignments and phylogenetic trees, is a rigorous statistical method, but due to its computational requirements, it has generally been limited to relatively small datasets (at most about 100 sequences). Here, we repurpose BAli-Phy as a 'phylogeny-aware' alignment method: we estimate the phylogeny from the input of unaligned sequences, and then use that as a fixed tree within BAli-Phy.
We show that this approach achieves high accuracy, greatly superior to Prank, the current most popular phylogeny-aware alignment method, and is even more accurate than MAFFT, one of the top performing alignment methods in common use. Furthermore, this approach can be used to align very large datasets (up to 1000 sequences in this study).
See https://doi.org/10.13012/B2IDB-7863273_V1 for datasets used in this study.
Supplementary data are available at Bioinformatics online.
BAli-Phy 是一种流行的贝叶斯方法,可同时估计多个序列比对和系统发育树,它是一种严格的统计方法,但由于其计算要求,通常仅限于相对较小的数据集(最多约 100 个序列)。在这里,我们将 BAli-Phy 重新用作“了解系统发育的”比对方法:我们从未对齐序列的输入中估计系统发育,然后将其作为 BAli-Phy 中的固定树使用。
我们表明,这种方法具有很高的准确性,大大优于当前最流行的了解系统发育的比对方法 Prank,甚至比常用的顶级比对方法之一 MAFFT 更准确。此外,这种方法可用于对齐非常大的数据集(在本研究中可达 1000 个序列)。
有关本研究中使用的数据集,请参见 https://doi.org/10.13012/B2IDB-7863273_V1。
补充数据可在“Bioinformatics”在线获取。