Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh.
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad332.
With the recent breakthroughs in sequencing technology, phylogeny estimation at a larger scale has become a huge opportunity. For accurate estimation of large-scale phylogeny, substantial endeavor is being devoted in introducing new algorithms or upgrading current approaches. In this work, we endeavor to improve the Quartet Fiduccia and Mattheyses (QFM) algorithm to resolve phylogenetic trees of better quality with better running time. QFM was already being appreciated by researchers for its good tree quality, but fell short in larger phylogenomic studies due to its excessively slow running time.
We have re-designed QFM so that it can amalgamate millions of quartets over thousands of taxa into a species tree with a great level of accuracy within a short amount of time. Named "QFM Fast and Improved (QFM-FI)", our version is 20 000× faster than the previous version and 400× faster than the widely used variant of QFM implemented in PAUP* on larger datasets. We have also provided a theoretical analysis of the running time and memory requirements of QFM-FI. We have conducted a comparative study of QFM-FI with other state-of-the-art phylogeny reconstruction methods, such as QFM, QMC, wQMC, wQFM, and ASTRAL, on simulated as well as real biological datasets. Our results show that QFM-FI improves on the running time and tree quality of QFM and produces trees that are comparable with state-of-the-art methods.
QFM-FI is open source and available at https://github.com/sharmin-mim/qfm_java.
随着测序技术的最新突破,更大规模的系统发育估计成为了一个巨大的机会。为了准确估计大规模的系统发育,人们正在努力引入新的算法或升级现有的方法。在这项工作中,我们努力改进四分体 Fiduccia 和 Mattheyses(QFM)算法,以提高质量并缩短运行时间。QFM 因其良好的树质量而受到研究人员的赞赏,但由于运行时间过长,在更大的系统基因组学研究中表现不佳。
我们重新设计了 QFM,以便能够在短时间内将数百万个四分体合并到数千个分类单元的物种树中,具有很高的准确性。我们的版本名为“QFM Fast and Improved (QFM-FI)”,比以前的版本快 20000 倍,比 PAUP*中广泛使用的 QFM 变体在更大的数据集上快 400 倍。我们还对 QFM-FI 的运行时间和内存需求进行了理论分析。我们对 QFM-FI 与其他最先进的系统发育重建方法,如 QFM、QMC、wQMC、wQFM 和 ASTRAL 进行了比较研究,包括模拟和真实生物数据集。我们的结果表明,QFM-FI 提高了 QFM 的运行时间和树质量,并产生了与最先进方法相当的树。
QFM-FI 是开源的,可在 https://github.com/sharmin-mim/qfm_java 上获得。