Suppr超能文献

50/50的保留表达几率表明了……中保留内含子与组成型剪接内含子之间的区别。

50/50 Expressional Odds of Retention Signifies the Distinction between Retained Introns and Constitutively Spliced Introns in .

作者信息

Mao Rui, Liang Chun, Zhang Yang, Hao Xingan, Li Jinyan

机构信息

College of Information Engineering, Northwest A&F University, Yangling, China.

Department of Biology, Miami University, Oxford, OH, United States.

出版信息

Front Plant Sci. 2017 Oct 9;8:1728. doi: 10.3389/fpls.2017.01728. eCollection 2017.

Abstract

Intron retention, one of the most prevalent alternative splicing events in plants, can lead to introns retained in mature mRNAs. However, in comparison with constitutively spliced introns (CSIs), the relevantly distinguishable features for retained introns (RIs) are still poorly understood. This work proposes a computational pipeline to discover novel RIs from multiple next-generation RNA sequencing (RNA-Seq) datasets of . Using this pipeline, we detected 3,472 novel RIs from 18 RNA-Seq datasets and re-confirmed 1,384 RIs which are currently annotated in the TAIR10 database. We also use the expression of intron-containing isoforms as a new feature in addition to the conventional features. Based on these features, RIs are highly distinguishable from CSIs by machine learning methods, especially when the expressional odds of retention (i.e., the expression ratio of the RI-containing isoforms relative to the isoforms without RIs for the same gene) reaches to or larger than 50/50. In this case, the RIs and CSIs can be clearly separated by the Random Forest with an outstanding performance of 0.95 on AUC (the area under a receiver operating characteristics curve). The closely related characteristics to the RIs include the low strength of splice sites, high similarity with the flanking exon sequences, low occurrence percentage of YTRAY near the acceptor site, existence of putative intronic splicing silencers (ISSs, i.e., AG/GA-rich motifs) and intronic splicing enhancers (ISEs, i.e., TTTT-containing motifs), and enrichment of Serine/Arginine-Rich (SR) proteins and heterogeneous nuclear ribonucleoparticle proteins (hnRNPs).

摘要

内含子保留是植物中最普遍的可变剪接事件之一,可导致内含子保留在成熟mRNA中。然而,与组成型剪接内含子(CSI)相比,保留内含子(RI)的相关显著特征仍知之甚少。这项工作提出了一种计算流程,用于从多个下一代RNA测序(RNA-Seq)数据集中发现新的RI。使用该流程,我们从18个RNA-Seq数据集中检测到3472个新的RI,并重新确认了1384个目前在TAIR10数据库中注释的RI。除了传统特征外,我们还将含内含子异构体的表达作为一个新特征。基于这些特征,通过机器学习方法,RI与CSI具有高度可区分性,特别是当保留的表达几率(即同一基因中含RI的异构体相对于不含RI的异构体的表达比率)达到或大于50/50时。在这种情况下,随机森林可以将RI和CSI清晰地分开,在AUC(受试者工作特征曲线下的面积)上具有0.95的出色表现。与RI密切相关的特征包括剪接位点强度低、与侧翼外显子序列高度相似、受体位点附近YTRAY出现百分比低、存在假定的内含子剪接沉默子(ISS,即富含AG/GA的基序)和内含子剪接增强子(ISE,即含TTTT的基序),以及富含丝氨酸/精氨酸(SR)蛋白和不均一核核糖核蛋白(hnRNP)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95d5/5640774/798cab4faba4/fpls-08-01728-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验