Center for Synthetic Microbiology (SYNMIKRO), Philipps Universität Marburg, Germany.
Bioinformatics and Systems Biology, Justus-Liebig-Universität, Giessen, Germany.
Nucleic Acids Res. 2021 Jan 25;49(2):986-1005. doi: 10.1093/nar/gkaa1229.
Extracytoplasmic function σ factors (ECFs) represent one of the major bacterial signal transduction mechanisms in terms of abundance, diversity and importance, particularly in mediating stress responses. Here, we performed a comprehensive phylogenetic analysis of this protein family by scrutinizing all proteins in the NCBI database. As a result, we identified an average of ∼10 ECFs per bacterial genome and 157 phylogenetic ECF groups that feature a conserved genetic neighborhood and a similar regulation mechanism. Our analysis expands previous classification efforts ∼50-fold, enriches many original ECF groups with previously unclassified proteins and identifies 22 entirely new ECF groups. The ECF groups are hierarchically related to each other and are further composed of subgroups with closely related sequences. This two-tiered classification allows for the accurate prediction of common promoter motifs and the inference of putative regulatory mechanisms across subgroups composing an ECF group. This comprehensive, high-resolution description of the phylogenetic distribution of the ECF family, together with the massive expansion of classified ECF sequences and an openly accessible data repository called 'ECF Hub' (https://www.computational.bio.uni-giessen.de/ecfhub), will serve as a powerful hypothesis-generator to guide future research in the field.
细胞外功能 σ 因子(ECFs)在数量、多样性和重要性方面代表了细菌主要的信号转导机制之一,特别是在介导应激反应方面。在这里,我们通过仔细检查 NCBI 数据库中的所有蛋白质,对这个蛋白质家族进行了全面的系统发育分析。结果,我们在每个细菌基因组中平均鉴定出约 10 个 ECF,并且鉴定出 157 个具有保守遗传邻域和相似调控机制的系统发育 ECF 组。我们的分析将以前的分类工作扩展了约 50 倍,用以前未分类的蛋白质丰富了许多原始的 ECF 组,并确定了 22 个全新的 ECF 组。ECF 组彼此具有层次关系,并且进一步由具有密切相关序列的子组组成。这种两级分类允许准确预测常见启动子基序,并推断组成 ECF 组的子组中的假定调控机制。这个 ECF 家族的系统发育分布的全面、高分辨率描述,以及分类 ECF 序列的大量扩展和一个名为“ECF Hub”(https://www.computational.bio.uni-giessen.de/ecfhub)的公开可访问数据存储库,将作为一个强大的假设生成器,指导该领域的未来研究。