Schumbera Eric, Dormann Dorothee, Walther Andreas, Andrade-Navarro Miguel A
Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hanns-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany.
Institute of Molecular Physiology, Johannes Gutenberg University, Hanns-Dieter-Hüsch-Weg 17, Mainz, 55128, Germany.
BMC Genomics. 2025 Oct 6;26(1):883. doi: 10.1186/s12864-025-12132-5.
Arginine-glycine (RG)-rich motifs are among the most prevalent RNA-binding elements within intrinsically disordered regions (IDRs) of proteins and play crucial roles in RNA metabolism, gene regulation, and the formation of membraneless organelles via liquid phase separation (LLPS). Despite their biological relevance and implication in neurological disorders and cancer, the sequence features and context dependencies that define functional RG motifs remain poorly characterized owing to their disordered nature and sequence variability. In this study, we present a computational framework to dissect the sequence and structural context of RG motifs across the human proteome. By contrasting a functionally defined positive dataset-enriched for RNA-binding and phase-separating proteins-with a negative dataset of RG motif proteins lacking these annotations, we identified distinct compositional and contextual signatures. RG motifs in the functionally defined positive dataset show increased enrichment of phenylalanine, tyrosine, aspartic acid, and asparagine, both within and around the motif, as well as nonrandom spatial relationships with structured RNA-binding domains. Notably, phenylalanine and tyrosine exhibit divergent positional and functional profiles, suggesting distinct mechanistic roles. Our analysis highlights the potential of sequence-based approaches to uncover functional determinants in disordered protein regions and further advances our understanding of the properties of RG motifs, offering a transferable framework for the study of other low-complexity motifs.
富含精氨酸 - 甘氨酸(RG)的基序是蛋白质内在无序区域(IDR)中最普遍的RNA结合元件之一,在RNA代谢、基因调控以及通过液相分离(LLPS)形成无膜细胞器过程中发挥关键作用。尽管它们与生物学相关且与神经疾病和癌症有关,但由于其无序性质和序列变异性,定义功能性RG基序的序列特征和上下文依赖性仍 poorly characterized。在本研究中,我们提出了一个计算框架来剖析整个人类蛋白质组中RG基序的序列和结构背景。通过将富含RNA结合和相分离蛋白的功能定义阳性数据集与缺乏这些注释的RG基序蛋白阴性数据集进行对比,我们确定了不同的组成和上下文特征。功能定义阳性数据集中的RG基序在基序内部和周围显示出苯丙氨酸、酪氨酸、天冬氨酸和天冬酰胺的富集增加,以及与结构化RNA结合结构域的非随机空间关系。值得注意的是,苯丙氨酸和酪氨酸表现出不同的位置和功能特征,表明其机制作用不同。我们的分析突出了基于序列的方法在揭示无序蛋白质区域功能决定因素方面的潜力,并进一步推进了我们对RG基序特性的理解,为研究其他低复杂性基序提供了一个可转移的框架。