Ahmed Firoz, Ansari Hifzur Rahman, Raghava Gajendra P S
Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, India.
BMC Bioinformatics. 2009 Apr 9;10:105. doi: 10.1186/1471-2105-10-105.
MicroRNAs (miRNAs) are produced by the sequential processing of a long hairpin RNA transcript by Drosha and Dicer, an RNase III enzymes, and form transitory small RNA duplexes. One strand of the duplex, which incorporates into RNA-induced silencing complex (RISC) and silences the gene expression is called guide strand, or miRNA; while the other strand of duplex is degraded and called the passenger strand, or miRNA*. Predicting the guide strand of miRNA is important for better understanding the RNA interference pathways.
This paper describes support vector machine (SVM) models developed for predicting the guide strands of miRNAs. All models were trained and tested on a dataset consisting of 329 miRNA and 329 miRNA* pairs using five fold cross validation technique. Firstly, models were developed using mono-, di-, and tri-nucleotide composition of miRNA strands and achieved the highest accuracies of 0.588, 0.638 and 0.596 respectively. Secondly, models were developed using split nucleotide composition and achieved maximum accuracies of 0.553, 0.641 and 0.602 for mono-, di-, and tri-nucleotide respectively. Thirdly, models were developed using binary pattern and achieved the highest accuracy of 0.708. Furthermore, when integrating the secondary structure features with binary pattern, an accuracy of 0.719 was seen. Finally, hybrid models were developed by combining various features and achieved maximum accuracy of 0.799 with sensitivity 0.781 and specificity 0.818. Moreover, the performance of this model was tested on an independent dataset that achieved an accuracy of 0.80. In addition, we also compared the performance of our method with various siRNA-designing methods on miRNA and siRNA datasets.
In this study, first time a method has been developed to predict guide miRNA strands, of miRNA duplex. This study demonstrates that guide and passenger strand of miRNA precursors can be distinguished using their nucleotide sequence and secondary structure. This method will be useful in understanding microRNA processing and can be implemented in RNA silencing technology to improve the biological and clinical research. A web server has been developed based on SVM models described in this study (http://crdd.osdd.net:8081/RISCbinder/).
微小RNA(miRNA)由Drosha和Dicer(两种核糖核酸酶III)对长链发夹状RNA转录本进行顺序加工产生,并形成瞬时小RNA双链体。双链体中的一条链会整合到RNA诱导沉默复合体(RISC)中并使基因表达沉默,这条链被称为引导链,即miRNA;而双链体的另一条链则会被降解,称为过客链,即miRNA*。预测miRNA的引导链对于更好地理解RNA干扰途径至关重要。
本文描述了为预测miRNA引导链而开发的支持向量机(SVM)模型。所有模型均使用五折交叉验证技术在一个由329对miRNA和329对miRNA*组成的数据集上进行训练和测试。首先,利用miRNA链的单核苷酸、二核苷酸和三核苷酸组成开发模型,其最高准确率分别为0.588、0.638和0.596。其次,利用分割核苷酸组成开发模型,单核苷酸、二核苷酸和三核苷酸的最高准确率分别为0.553、0.641和0.602。第三,利用二元模式开发模型,最高准确率为0.708。此外,当将二级结构特征与二元模式相结合时,准确率达到了0.719。最后,通过组合各种特征开发混合模型,最高准确率为0.799,灵敏度为0.781,特异性为0.818。此外,该模型在一个独立数据集上进行了测试,准确率达到了0.80。另外,我们还在miRNA和siRNA数据集上,将我们方法的性能与各种siRNA设计方法进行了比较。
在本研究中,首次开发了一种预测miRNA双链体中引导miRNA链的方法。本研究表明,可以利用miRNA前体的核苷酸序列和二级结构来区分引导链和过客链。该方法将有助于理解微小RNA的加工过程,并可应用于RNA沉默技术,以改善生物学和临床研究。基于本研究中描述的SVM模型开发了一个网络服务器(http://crdd.osdd.net:8081/RISCbinder/)。