Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile.
ANID-Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), Santiago, Chile.
Protein Sci. 2022 Jun;31(6):e4337. doi: 10.1002/pro.4337.
The NusG protein family is structurally and functionally conserved in all domains of life. Its members directly bind RNA polymerases and regulate transcription processivity and termination. RfaH, a divergent sub-family in its evolutionary history, is known for displaying distinct features than those in NusG proteins, which allows them to regulate the expression of virulence factors in enterobacteria in a DNA sequence-dependent manner. A striking feature is its structural interconversion between an active fold, which is the canonical NusG three-dimensional structure, and an autoinhibited fold, which is distinctively novel. How this novel fold is encoded within RfaH sequence to encode a metamorphic protein remains elusive. In this work, we used publicly available genomic RfaH protein sequences to construct a complete multiple sequence alignment, which was further augmented with metagenomic sequences and curated by predicting their secondary structure propensities using JPred. Coevolving pairs of residues were calculated from these sequences using plmDCA and GREMLIN, which allowed us to detect the enrichment of key metamorphic contacts after sequence filtering. Finally, we combined our coevolutionary predictions with molecular dynamics to demonstrate that these interactions are sufficient to predict the structures of both native folds, where coevolutionary-derived non-native contacts may play a key role in achieving the compact RfaH novel fold. All in all, emergent coevolutionary signals found within RfaH sequences encode the autoinhibited and active folds of this protein, shedding light on the key interactions responsible for the action of this metamorphic protein.
NusG 蛋白家族在所有生命领域的结构和功能都保守。其成员直接结合 RNA 聚合酶,调节转录的持续性和终止。在进化历史上,RfaH 是一个分化的亚家族,以显示与 NusG 蛋白不同的特征而闻名,这使它们能够以 DNA 序列依赖的方式调节肠杆菌中毒力因子的表达。一个显著的特征是其结构在活性折叠和自动抑制折叠之间的转换,后者是独特新颖的。这种新型折叠如何在 RfaH 序列中编码,以编码一种变形蛋白,仍然难以捉摸。在这项工作中,我们使用公开的基因组 RfaH 蛋白序列构建了一个完整的多重序列比对,进一步用宏基因组序列扩充,并使用 JPred 预测它们的二级结构倾向进行了策展。使用 plmDCA 和 GREMLIN 从这些序列计算共进化对残基,这使我们能够在序列过滤后检测关键变形接触的富集。最后,我们将共进化预测与分子动力学相结合,证明这些相互作用足以预测两种天然折叠的结构,其中共进化衍生的非天然接触可能在实现 RfaH 新型折叠的紧凑性方面发挥关键作用。总之,在 RfaH 序列中发现的新兴共进化信号编码了该蛋白的自动抑制和活性折叠,揭示了负责这种变形蛋白作用的关键相互作用。