Suppr超能文献

PlantMiRNAPred:真核和拟南芥前体 miRNA 的高效分类。

PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs.

机构信息

Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, PR China.

出版信息

Bioinformatics. 2011 May 15;27(10):1368-76. doi: 10.1093/bioinformatics/btr153. Epub 2011 Mar 26.

Abstract

MOTIVATION

MicroRNAs (miRNAs) are a set of short (21-24 nt) non-coding RNAs that play significant roles as post-transcriptional regulators in animals and plants. While some existing methods use comparative genomic approaches to identify plant precursor miRNAs (pre-miRNAs), others are based on the complementarity characteristics between miRNAs and their target mRNAs sequences. However, they can only identify the homologous miRNAs or the limited complementary miRNAs. Furthermore, since the plant pre-miRNAs are quite different from the animal pre-miRNAs, all the ab initio methods for animals cannot be applied to plants. Therefore, it is essential to develop a method based on machine learning to classify real plant pre-miRNAs and pseudo genome hairpins.

RESULTS

A novel classification method based on support vector machine (SVM) is proposed specifically for predicting plant pre-miRNAs. To make efficient prediction, we extract the pseudo hairpin sequences from the protein coding sequences of Arabidopsis thaliana and Glycine max, respectively. These pseudo pre-miRNAs are extracted in this study for the first time. A set of informative features are selected to improve the classification accuracy. The training samples are selected according to their distributions in the high-dimensional sample space. Our classifier PlantMiRNAPred achieves >90% accuracy on the plant datasets from eight plant species, including A.thaliana, Oryza sativa, Populus trichocarpa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Zea mays and G.max. The superior performance of the proposed classifier can be attributed to the extracted plant pseudo pre-miRNAs, the selected training dataset and the carefully selected features. The ability of PlantMiRNAPred to discern real and pseudo pre-miRNAs provides a viable method for discovering new non-homologous plant pre-miRNAs.

摘要

动机

MicroRNAs(miRNAs)是一组短(21-24nt)非编码 RNA,在动植物中作为转录后调控因子发挥重要作用。虽然一些现有的方法使用比较基因组方法来鉴定植物前体 miRNAs(pre-miRNAs),但其他方法则基于 miRNAs 和其靶 mRNAs 序列之间的互补特征。然而,它们只能识别同源 miRNAs 或有限的互补 miRNAs。此外,由于植物 pre-miRNAs 与动物 pre-miRNAs 有很大的不同,因此所有基于从头开始的动物方法都不能应用于植物。因此,开发一种基于机器学习的方法来分类真正的植物 pre-miRNAs 和伪基因组发夹结构是至关重要的。

结果

提出了一种专门用于预测植物 pre-miRNAs 的基于支持向量机(SVM)的新型分类方法。为了进行有效的预测,我们分别从拟南芥和大豆的蛋白质编码序列中提取伪发夹序列。这些伪 pre-miRNAs 是在本研究中首次提取出来的。选择了一组有信息的特征来提高分类精度。根据高维样本空间中的分布选择训练样本。我们的分类器 PlantMiRNAPred 在来自 8 种植物物种(包括拟南芥、水稻、杨树、泡桐、蒺藜苜蓿、高粱、玉米和大豆)的植物数据集上实现了>90%的准确率。所提出的分类器的优异性能可归因于提取的植物伪 pre-miRNAs、选择的训练数据集和精心选择的特征。PlantMiRNAPred 区分真实和伪 pre-miRNAs 的能力为发现新的非同源植物 pre-miRNAs 提供了一种可行的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验