Institute of Applied Biosciences (INAB), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece.
Lab. Technological Advances for Genomics & Clinics (TAGC), Université d'Aix-Marseille (AMU), INSERM Unit U1090, 163, Avenue de Luminy, 13288 Marseille cedex 09, France.
Microb Genom. 2020 Nov;6(11). doi: 10.1099/mgen.0.000429.
As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.
随着基因组测序工作以空前的速度揭示生物多样性的遗传多样性,人们需要准确描述已测序物种及其推断的祖先的结构和功能特性,这些物种在其系统发育的任何特定分类级别上都存在。已经开发出了用于在序列水平上重建祖先状态的精细方法,随后又开发了基于基因含量的方法。虽然这些序列或基因含量重建方法已经成功部署,但在明确推断祖先基因组的功能特性方面,在代谢途径和其他细胞过程方面的进展较少。在此,我们描述了 PathTrace,这是一种用于单个代谢途径进化历史的简约重建的高效算法,是细胞功能关键功能模块的重要表示。该算法通过五个步骤实现,其中途径表示为模糊向量,其中每个酶都与从其蛋白质序列的系统发育谱得出的分类保存值相关联。该方法通过将途径集部署为泛基因组驱动的方法,针对来自关键数据资源的基因组序列集合对选定的途径基准集进行了评估。通过采用途径集的泛基因组驱动方法,我们证明了推断的模式在很大程度上不受噪声的影响,而与基因含量重建方法相反。此外,重建的结果与所研究分类单元的进化距离密切相关,这表明对目标泛基因组进行精心选择对于保持方法的凝聚力和推断的一致性至关重要,作为对任意查询选择的内部控制。PathTrace 方法是对代谢途径进化进行大规模分析以及更深入了解新兴泛基因组集合中反映的功能关系的第一步。