Aghamirzaie Delasa, Batra Dhruv, Heath Lenwood S, Schneider Andrew, Grene Ruth, Collakova Eva
Genetics, Bioinformatics and Computational Biology Program, Virginia Tech, Blacksburg, VA, 24061, USA.
Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24061, USA.
BMC Genomics. 2015 Nov 14;16:928. doi: 10.1186/s12864-015-2108-x.
Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus.
We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96 % accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13 % was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean.
This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.
转录组学揭示了具有不同编码潜能和链方向的转录本的存在。可变剪接(AS)可产生具有不同数量和类型功能结构域的蛋白质,这表明转录和转录后事件普遍存在。许多生物学过程,包括种子成熟和脱水,都受到转录后调控(例如通过可变剪接),导致从单个基因座产生不止一种编码或非编码有义转录本。
我们提出了一个综合计算框架来预测植物转录本的异构体特异性功能。该框架包括一个名为CodeWise的新型植物特异性加权支持向量机分类器,它预测转录本编码潜能的准确率超过96%,以及其他几种工具,可进行全局序列相似性、功能结构域和共表达网络分析。首先,该框架应用于所有检测到的数据转录本(103,106个),其中CodeWise预测在发育中的大豆胚中有13%为非编码RNA。其次,为了研究可变剪接在大豆胚发育过程中的作用,分析并挖掘了2938个可变剪接且差异表达的剪接变体群体的表达时间。保守结构域分析表明,可变剪接导致蛋白质变体中功能结构域的数量、类型和截短程度发生全局变化。使用ArrayMining进行的异构体特异性共表达网络分析和聚类分析揭示了与种子成熟和脱水耐受性获得相关的选定信号通路的特定子网络以及各组分之间的潜在相互作用。这些信号通路涉及脱落酸和FUSCA3相关转录本,其中一些被分类为非编码和/或反义转录本,并与相应的编码转录本共表达。非编码和反义转录本可能在大豆种子成熟和脱水相关信号传导中发挥重要调节作用。
这项工作展示了如何实施我们的综合框架,以对与感兴趣的生物学过程相关的转录本和蛋白质的编码潜能、共表达、共调控和功能做出可通过实验验证的预测。