Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.
Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain.
Mol Biol Evol. 2021 Oct 27;38(11):5204-5208. doi: 10.1093/molbev/msab234.
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.
Possvm(基于物种重叠和 MCL(马尔可夫聚类算法)的系统发育直系同源排序)是一种工具,可自动从预先计算的系统发育树中识别同源基因簇,并对基因家族进行分类。它使用物种重叠算法识别基因之间的同源关系,从基因树拓扑结构推断分类学信息,然后使用 MCL 识别同源聚类,并提供带注释的基因家族。我们的基准测试表明,在提供准确的系统发育树的情况下,这种方法能够非常高的精度和召回率识别手动 curated 的同源基因簇。总体而言,Possvm 以高度可解释的方式自动执行基因树检查和注释的常规过程,并提供可重复使用的输出和与系统发育相关的基因注释,可用于指导比较基因组学和基因家族进化分析。