Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, 43124, Parma, Italy.
Biosystems. 2021 Sep;207:104468. doi: 10.1016/j.biosystems.2021.104468. Epub 2021 Jun 30.
In eukaryotes, RNA polymerase II (Pol II) is responsible for the synthesis of all mRNAs and myriads of short and long untranslated RNAs, whose fabrication involves close spatiotemporal coordination between transcription, RNA processing and chromatin modification. Crucial for such a coordination is an unusual C-terminal domain (CTD) of the Pol II largest subunit, made of tandem repetitions (26 in yeast, 52 in chordates) of the heptapeptide with the consensus sequence YSPTSPS. Although largely unstructured and with poor sequence content, the Pol II CTD derives its extraordinary functional versatility from the fact that each amino acid in the heptapeptide can be posttranslationally modified, and that different combinations of CTD covalent marks are specifically recognized by different protein binding partners. These features have led to propose the existence of a Pol II CTD code, but this expression is generally used by authors with some caution, revealed by the frequent use of quote marks for the word 'code'. Based on the theoretical framework of code biology, it is argued here that the Pol II CTD modification system meets the requirements of a true organic code, where different CTD modification states represent organic signs whose organic meanings are biological reactions contributing to the many facets of RNA biogenesis in coordination with RNA synthesis by Pol II. Importantly, the Pol II CTD code is instantiated by adaptor proteins possessing at least two distinct domains, one of which devoted to specific recognition of CTD modification profiles. Furthermore, code rules can be altered by experimental interchange of CTD recognition domains of different adaptor proteins, a fact arguing in favor of the arbitrariness, and thus bona fide character, of the Pol II CTD code. Since the growing family of CTD adaptors includes RNA binding proteins and histone modification complexes, the Pol II CTD code is by its nature integrated with other organic codes, in particular the splicing code and the histone code. These issues will be discussed taking into account fascinating developments in Pol II CTD research, like the discovery of novel modifications at non-consensus sites, the recently recognized CTD physicochemical properties favoring liquid-liquid phase separation, and the discovery that the Pol II CTD, originated before the divergence of most extant eukaryotic taxa, has expanded and diversified with developmental complexity in animals and plants.
在真核生物中,RNA 聚合酶 II(Pol II)负责合成所有的 mRNA 和无数的短链和长链非编码 RNA,其合成涉及转录、RNA 加工和染色质修饰之间的紧密时空协调。这种协调的关键是 Pol II 大亚基的一个不寻常的 C 端结构域(CTD),它由七肽重复序列(酵母中 26 个,脊椎动物中 52 个)组成,该序列的共识序列为 YSPTSPS。尽管 CTD 大部分没有结构且序列含量较差,但 Pol II CTD 具有非凡的多功能性,这是因为七肽中的每个氨基酸都可以进行翻译后修饰,并且 CTD 共价标记的不同组合被不同的蛋白结合伴侣特异性识别。这些特征导致提出了 Pol II CTD 密码的存在,但这个表达通常被作者谨慎使用,这可以从“密码”一词经常使用引号来证明。基于代码生物学的理论框架,本文认为 Pol II CTD 修饰系统符合真正有机代码的要求,其中不同的 CTD 修饰状态代表有机符号,其有机意义是通过 Pol II 与 RNA 合成协调参与 RNA 生物发生的许多方面的生物反应。重要的是,Pol II CTD 密码由至少具有两个不同结构域的衔接蛋白实例化,其中一个结构域专门用于识别 CTD 修饰谱。此外,通过不同衔接蛋白的 CTD 识别结构域的实验交换可以改变代码规则,这一事实支持了 Pol II CTD 密码的任意性,因此是真正的有机代码。由于 CTD 衔接蛋白的家族不断增加,包括 RNA 结合蛋白和组蛋白修饰复合物,因此 Pol II CTD 密码本质上与其他有机代码集成在一起,特别是剪接代码和组蛋白代码。在考虑 Pol II CTD 研究中引人入胜的发展时,如在非共识位点发现新的修饰、最近认识到的有利于液-液相分离的 CTD 物理化学性质,以及发现 Pol II CTD 起源于大多数现存真核生物分类群分化之前,并随着动物和植物的发育复杂性而扩张和多样化时,将讨论这些问题。