CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China; Synthetic Biology and Biotechnology Laboratory, State Key Laboratory of Bioreactor Engineering, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.
Cell Syst. 2018 Nov 28;7(5):510-520.e4. doi: 10.1016/j.cels.2018.09.002. Epub 2018 Nov 7.
Alternative splicing (AS) is generally regulated by trans-splicing factors that specifically bind to cis-elements in pre-mRNAs. The human genome encodes ∼1,500 RNA binding proteins (RBPs) that potentially regulate AS, yet their functions remain largely unknown. To explore their potential activities, we fused the putative functional domains of RBPs to a sequence-specific RNA-binding domain and systemically analyzed how these engineered factors affect splicing. We discovered that ∼80% of low-complexity domains in endogenous RBPs displayed distinct context-dependent activities in regulating splicing, indicating that AS is under more extensive regulation than previously expected. We developed a machine learning approach to classify and predict the activities of RBPs based on their sequence compositions and further validated this model using endogenous RBPs and synthetic polypeptides. These results represent a systematic inspection, modeling, prediction, and validation of how RBP sequences affect their activities in controlling splicing, paving the way for de novo engineering of artificial splicing factors.
可变剪接(AS)通常受反式剪接因子调控,这些因子特异性地结合到 pre-mRNAs 中的顺式元件。人类基因组编码约 1500 种 RNA 结合蛋白(RBPs),它们可能调节 AS,但它们的功能仍知之甚少。为了探索它们的潜在活性,我们将 RBP 的假定功能域融合到序列特异性 RNA 结合域上,并系统地分析了这些工程化因子如何影响剪接。我们发现,内源性 RBPs 中的约 80%的低复杂度结构域在调节剪接方面表现出独特的、依赖上下文的活性,这表明 AS 受到的调控比之前预期的更为广泛。我们开发了一种机器学习方法,根据 RBP 的序列组成对其活性进行分类和预测,并使用内源性 RBPs 和合成多肽进一步验证了该模型。这些结果代表了对 RBP 序列如何影响其在控制剪接中的活性的系统检查、建模、预测和验证,为人工剪接因子的从头设计铺平了道路。