Koshkarov Aleksandr, Tahiri Nadia
Department of Computer Science, University of Sherbrooke, 2500, Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada.
Department of Computer Science, University of Sherbrooke, 2500, Boulevard de l'Université, Sherbrooke, Québec J1K 2R1,
Bioinform Adv. 2023 Mar 3;3(1):vbad023. doi: 10.1093/bioadv/vbad023. eCollection 2023.
For many years, evolutionary and molecular biologists have been working with phylogenetic supertrees, which are oriented acyclic graph structures. In the standard approaches, supertrees are obtained by concatenating a set of phylogenetic trees defined on different but overlapping sets of taxa (i.e. species). More recent approaches propose alternative solutions for supertree inference. The testing of new metrics for comparing supertrees and adapting clustering algorithms to overlapping phylogenetic trees with different numbers of leaves requires large amounts of data. In this context, designing a new approach and developing a computer program to generate phylogenetic tree clusters with different numbers of overlapping leaves are key elements to advance research on phylogenetic supertrees and evolution. The main objective of the project is to propose a new approach to simulate clusters of phylogenetic trees defined on different, but mutually overlapping, sets of taxa, with biological events. The proposed generator can be used to generate a certain number of clusters of phylogenetic trees in Newick format with a variable number of leaves and with a defined level of overlap between trees in clusters.
A Python script version 3.7, called GPTree Cluster, which implements the discussed approach, is freely available at: https://github.com/tahiri-lab/GPTree/tree/GPTreeCluster.
多年来,进化生物学家和分子生物学家一直在研究系统发育超级树,它是有向无环图结构。在标准方法中,超级树是通过拼接一组定义在不同但重叠的分类单元(即物种)集合上的系统发育树来获得的。最近的方法提出了超级树推断的替代解决方案。测试用于比较超级树的新指标以及使聚类算法适用于具有不同叶数的重叠系统发育树需要大量数据。在这种情况下,设计一种新方法并开发一个计算机程序来生成具有不同数量重叠叶的系统发育树聚类是推进系统发育超级树和进化研究的关键要素。该项目的主要目标是提出一种新方法,用于模拟定义在不同但相互重叠的分类单元集合上的系统发育树聚类,并考虑生物事件。所提出的生成器可用于生成一定数量的Newick格式的系统发育树聚类,这些聚类具有可变数量的叶,并且聚类中的树之间具有定义的重叠程度。
一个名为GPTree Cluster的Python 3.7脚本实现了所讨论的方法,可在以下网址免费获取:https://github.com/tahiri-lab/GPTree/tree/GPTreeCluster。