Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA.
PLoS One. 2010 May 26;5(5):e10710. doi: 10.1371/journal.pone.0010710.
MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20-22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.
We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM) was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved) class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.
Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other kingdoms, which can provide insight into antisense transcription, miRNA evolution, and post-transcriptional gene regulation.
微小 RNA(miRNA)和反式作用小干扰 RNA(tasi-RNA)分别是由发夹二级结构或反义转录本产生的 20-22 个核苷酸长的 RNA(smRNA),通过 Watson-Crick 配对与靶 mRNA 结合并通过与 RNA 干扰相关的机制改变表达来调节基因表达。植物 miRNA 与其靶标的高度序列同源性一直是 miRNA 预测算法的基础,这些算法在预测其他生物界的 miRNA 时其预测能力有限,因为 miRNA 互补性不太保守,但反式过程(产生反义 smRNA)在真核生物中是活跃的。我们假设反义转录和相关的 smRNA 是可以进行计算建模以发现基因的生物标志物。
我们在公开的全基因组平铺阵列转录组数据和测序的 smRNA 文库(以及 C. elegans)中探索了水稻(Oryza sativa)的正义和反义基因表达,并发现了 miRNA 基因的反式性证据,类似于在拟南芥中发现的证据。反义转录本丰度、反义 ESTs 的存在以及与 smRNA 的关联的统计分析表明,数百个拟南芥“孤儿”假设基因是非编码 RNA。与这一假设一致,我们在先前注释的蛋白质编码基因的反义链上发现了一些 miRNA 基因的新的拟南芥同源物。使用结合了正义/反义转录拓扑和 siRNA 丰度的新型表达特征的结合热力学能量的支持向量机(SVM)构建了 miRNA 靶标的预测模型。当在靶标上进行训练时,SVM 可以以 84%的准确性预测“古老”(深度保守)类已验证的拟南芥 miRNA 基因,并且以 76%的准确性预测“新”快速进化的 miRNA 基因。
反义及 smRNA 表达特征和计算方法可以在植物中识别新的 miRNA 基因和其他非编码 RNA,并且可能在其他生物界中识别,这可以提供对反义转录、miRNA 进化和转录后基因调控的深入了解。