Rafi Abdur, Rumi Ahmed Mahir Sultan, Hakim Sheikh Azizul, Tahmid Md Toki, Momin Rabib Jahin Ibn, Zaman Tanjeem Azwad, Reaz Rezwana, Bayzid Md Shamsuzzoha
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh.
Bioinform Adv. 2025 Mar 13;5(1):vbaf053. doi: 10.1093/bioadv/vbaf053. eCollection 2025.
methods are becoming increasingly popular for species tree estimation from multi-locus data in the presence of gene tree discordance. Accurate Species TRee Algorithm (ASTRAL), a leading method in this class, solves the Maximum Quartet Support Species Tree problem within a constrained solution space, while heuristics like Weighted Quartet Fiduccia-Mattheyses (wQFM) and Weighted Quartet MaxCut (wQMC) use weighted quartets and a divide-and-conquer strategy. Recent studies showed wQFM to be more accurate than ASTRAL and wQMC, though its scalability is hindered by the computational demands of explicitly generating and weighting quartets. Here, we introduce wQFM-TREE, a novel summary method that enhances wQFM by avoiding explicit quartet generation and weighting, enabling its application to large datasets.
Extensive simulations under diverse and challenging model conditions, with hundreds or thousands of taxa and genes, consistently demonstrate that wQFM-TREE matches or improves upon the accuracy of ASTRAL. It outperformed ASTRAL in 25 of 27 model conditions (statistically significant in 20) involving 200-1000 taxa. Moreover, applying wQFM-TREE to re-analyze the green plant dataset from the One Thousand Plant Transcriptomes Initiative produced a tree highly congruent with established evolutionary relationships of plants. wQFM-TREE's remarkable accuracy and scalability make it a strong competitor to leading methods. Its algorithmic and combinatorial innovations also enhance quartet-based computations, advancing phylogenetic estimation.
wQFM-TREE is freely available in open source form at https://github.com/abdur-rafi/wQFM-TREE.
在存在基因树不一致的情况下,利用多基因座数据估计物种树的方法越来越受欢迎。精确物种树算法(ASTRAL)是这类方法中的领先者,它在一个受限的解空间内解决最大四重奏支持物种树问题,而诸如加权四重奏菲杜西亚 - 马西耶斯算法(wQFM)和加权四重奏最大割算法(wQMC)之类的启发式方法则使用加权四重奏和分治法策略。最近的研究表明,wQFM比ASTRAL和wQMC更准确,尽管其可扩展性受到显式生成和加权四重奏的计算需求的阻碍。在这里,我们介绍wQFM - TREE一种新颖的数据汇总方法,它通过避免显式的四重奏生成和加权来增强wQFM,从而使其能够应用于大型数据集。
在包含数百或数千个分类单元以及基因的各种具有挑战性的模型条件下进行的广泛模拟一致表明,wQFM - TREE与ASTRAL的准确性相当或有所提高。在涉及200 - 1000个分类单元的27种模型条件中的25种(其中20种具有统计学意义)下,它的表现优于ASTRAL。此外,应用wQFM - TREE重新分析来自千种植物转录组计划的绿色植物数据集,得到了一棵与已确立的植物进化关系高度一致的树。wQFM - TREE卓越的准确性和可扩展性使其成为领先方法的有力竞争对手。其算法和组合创新也增强了基于四重奏的计算,推动了系统发育估计的发展。
wQFM - TREE以开源形式免费提供,网址为https://github.com/abdur - rafi/wQFM - TREE。