Genome Evolution and Ecology Group, Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria.
Vienna Doctoral School of Ecology and Evolution, University of Vienna, Vienna, Austria.
mSystems. 2024 Jun 18;9(6):e0094823. doi: 10.1128/msystems.00948-23. Epub 2024 May 3.
The majority of newly discovered archaeal lineages remain without a cultivated representative, but scarce experimental data from the cultivated organisms show that they harbor distinct functional repertoires. To unveil the ecological as well as evolutionary impact of from metagenomics, new computational methods need to be developed, followed by in-depth analysis. Among them is the genome-wide protein fusion screening performed here. Natural fusions and fissions of genes not only contribute to microbial evolution but also complicate the correct identification and functional annotation of sequences. The products of these processes can be defined as fusion (or composite) proteins, the ones consisting of two or more domains originally encoded by different genes and split proteins, and the ones originating from the separation of a gene in two (fission). Fusion identifications are required for proper phylogenetic reconstructions and metabolic pathway completeness assessments, while mappings between fused and unfused proteins can fill some of the existing gaps in metabolic models. In the archaeal genome-wide screening, more than 1,900 fusion/fission protein clusters were identified, belonging to both newly sequenced and well-studied lineages. These protein families are mainly associated with different types of metabolism, genetic, and cellular processes. Moreover, 162 of the identified fusion/fission protein families are archaeal specific, having no identified fused homolog within the bacterial domain. Our approach was validated by the identification of experimentally characterized fusion/fission cases. However, around 25% of the identified fusion/fission families lack functional annotations for both composite and split states, showing the need for experimental characterization in Archaea.IMPORTANCEGenome-wide fusion screening has never been performed in on a broad taxonomic scale. The overlay of multiple computational techniques allows the detection of a fine-grained set of predicted fusion/fission families, instead of rough estimations based on conserved domain annotations only. The exhaustive mapping of fused proteins to bacterial organisms allows us to capture fusion/fission families that are specific to archaeal biology, as well as to identify links between bacterial and archaeal lineages based on cooccurrence of taxonomically restricted proteins and their sequence features. Furthermore, the identification of poorly characterized lineage-specific fusion proteins opens up possibilities for future experimental and computational investigations. This approach enhances our understanding of Archaea in general and provides potential candidates for in-depth studies in the future.
大多数新发现的古菌谱系仍然没有培养代表,但从已培养生物中获得的稀缺实验数据表明,它们具有独特的功能。为了揭示宏基因组中 的生态和进化影响,需要开发新的计算方法,并进行深入分析。其中包括这里进行的全基因组蛋白融合筛选。基因的自然融合和断裂不仅有助于微生物进化,还会使序列的正确识别和功能注释复杂化。这些过程的产物可以定义为融合(或复合)蛋白,即由两个或多个原本由不同基因编码的结构域组成的蛋白,以及由基因在两个位置分裂而成的蛋白。融合识别对于正确的系统发育重建和代谢途径完整性评估是必需的,而融合蛋白和非融合蛋白之间的映射可以填补代谢模型中存在的一些空白。在古菌全基因组筛选中,鉴定了 1900 多个融合/断裂蛋白簇,这些簇属于新测序和研究较多的谱系。这些蛋白家族主要与不同类型的代谢、遗传和细胞过程有关。此外,在所鉴定的融合/断裂蛋白家族中,有 162 个是古菌特有的,在细菌域中没有鉴定到融合同源物。我们的方法通过鉴定实验表征的融合/断裂案例得到了验证。然而,大约 25%的鉴定出的融合/断裂家族在复合和分裂状态下都缺乏功能注释,这表明在古菌中需要进行实验表征。
重要性在广泛的分类尺度上,从未在 中进行过全基因组融合筛选。多种计算技术的叠加允许检测到一组精细的预测融合/断裂家族,而不是仅基于保守结构域注释的粗略估计。融合蛋白与细菌生物的详尽映射使我们能够捕获仅存在于古菌生物学中的融合/断裂家族,并根据分类限制蛋白的共现及其序列特征来识别细菌和古菌谱系之间的联系。此外,鉴定出特征不明显的谱系特异性融合蛋白为未来的实验和计算研究开辟了可能性。这种方法增强了我们对古菌的一般理解,并为未来的深入研究提供了潜在的候选者。