Hsu Justin Bo-Kai, Huang Kai-Yao, Weng Tzu-Ya, Huang Chien-Hsun, Lee Tzong-Yi
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-chu, 300, Taiwan,
J Comput Aided Mol Des. 2014 Jan;28(1):49-60. doi: 10.1007/s10822-014-9706-6. Epub 2014 Jan 19.
Machinery of pre-mRNA splicing is carried out through the interaction of RNA sequence elements and a variety of RNA splicing-related proteins (SRPs) (e.g. spliceosome and splicing factors). Alternative splicing, which is an important post-transcriptional regulation in eukaryotes, gives rise to multiple mature mRNA isoforms, which encodes proteins with functional diversities. However, the regulation of RNA splicing is not yet fully elucidated, partly because SRPs have not yet been exhaustively identified and the experimental identification is labor-intensive. Therefore, we are motivated to design a new method for identifying SRPs with their functional roles in the regulation of RNA splicing. The experimentally verified SRPs were manually curated from research articles. According to the functional annotation of Splicing Related Gene Database, the collected SRPs were further categorized into four functional groups including small nuclear Ribonucleoprotein, Splicing Factor, Splicing Regulation Factor and Novel Spliceosome Protein. The composition of amino acid pairs indicates that there are remarkable differences among four functional groups of SRPs. Then, support vector machines (SVMs) were utilized to learn the predictive models for identifying SRPs as well as their functional roles. The cross-validation evaluation presents that the SVM models trained with significant amino acid pairs and functional domains could provide a better predictive performance. In addition, the independent testing demonstrates that the proposed method could accurately identify SRPs in mammals/plants as well as effectively distinguish between SRPs and RNA-binding proteins. This investigation provides a practical means to identifying potential SRPs and a perspective for exploring the regulation of RNA splicing.
前体mRNA剪接机制是通过RNA序列元件与多种RNA剪接相关蛋白(SRP,如剪接体和剪接因子)的相互作用来实现的。可变剪接是真核生物中一种重要的转录后调控方式,可产生多种成熟的mRNA异构体,这些异构体编码具有功能多样性的蛋白质。然而,RNA剪接的调控尚未完全阐明,部分原因是尚未全面鉴定出SRP,且实验鉴定工作强度大。因此,我们有动力设计一种新方法来鉴定在RNA剪接调控中具有功能作用的SRP。通过研究文章手动整理出经过实验验证的SRP。根据剪接相关基因数据库的功能注释,将收集到的SRP进一步分为四个功能组,包括小核核糖核蛋白、剪接因子、剪接调控因子和新型剪接体蛋白。氨基酸对的组成表明,SRP的四个功能组之间存在显著差异。然后,利用支持向量机(SVM)学习用于鉴定SRP及其功能作用的预测模型。交叉验证评估表明,用显著氨基酸对和功能域训练的SVM模型能够提供更好的预测性能。此外,独立测试表明,所提出的方法能够准确鉴定哺乳动物/植物中的SRP,并有效区分SRP和RNA结合蛋白。本研究为鉴定潜在的SRP提供了一种实用方法,并为探索RNA剪接调控提供了一个视角。