Suppr超能文献

利用支持向量机和字符串核识别微小RNA前体

Identification of microRNA precursors with support vector machine and string kernel.

作者信息

Xu Jian-Hua, Li Fei, Sun Qiu-Feng

机构信息

Department of Computer Science, Nanjing Normal University, Nanjing 210097, China.

出版信息

Genomics Proteomics Bioinformatics. 2008 Jun;6(2):121-8. doi: 10.1016/S1672-0229(08)60027-3.

Abstract

MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.

摘要

微小RNA(miRNA)是一类短的(21 - 23个核苷酸)调控性非编码RNA,由长的(70 - 110个核苷酸)miRNA前体(pre - miRNA)加工而来。识别真假前体在miRNA的计算识别中起着重要作用。已经从前体序列及其二级结构中提取了一些数值特征以适用于某些分类方法;然而,它们可能会丢失隐藏在序列和结构中的一些有用的判别信息。在本研究中,pre - miRNA序列及其二级结构直接用于基于两个序列之间的加权莱文斯坦距离构建指数核。然后将此字符串核与支持向量机(SVM)相结合以检测真假pre - miRNA。基于331个真假人类pre - miRNA的训练样本,通过5折交叉验证和网格搜索选择SVM中的2个关键参数,并执行5次具有不同5折划分的实现。在来自3个人类、8个动物、2个植物、1个病毒和2个人工伪造的人类pre - miRNA的16个独立测试集中,我们的方法在11个数据集上在统计学上优于先前基于SVM的技术,包括3个人类、7个动物和1个人造伪造的人类pre - miRNA。特别是,本研究中正确识别了通常在先前工作中被排除的具有多个环的pre - miRNA,准确率为92.66%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6203/5054094/ea3ea47d4c7e/gr1.jpg

相似文献

1
Identification of microRNA precursors with support vector machine and string kernel.
Genomics Proteomics Bioinformatics. 2008 Jun;6(2):121-8. doi: 10.1016/S1672-0229(08)60027-3.
2
Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM.
Genomics. 2011 Aug;98(2):73-8. doi: 10.1016/j.ygeno.2011.04.011. Epub 2011 May 14.
4
De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures.
Bioinformatics. 2007 Jun 1;23(11):1321-30. doi: 10.1093/bioinformatics/btm026. Epub 2007 Jan 31.
5
microPred: effective classification of pre-miRNAs for human miRNA gene prediction.
Bioinformatics. 2009 Apr 15;25(8):989-95. doi: 10.1093/bioinformatics/btp107. Epub 2009 Feb 20.
6
MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features.
BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S11. doi: 10.1186/1471-2105-11-S11-S11.
8
PMirP: a pre-microRNA prediction method based on structure-sequence hybrid features.
Artif Intell Med. 2010 Jun;49(2):127-32. doi: 10.1016/j.artmed.2010.03.004. Epub 2010 Apr 15.
9
Bioinformatics Study of Structural Patterns in Plant MicroRNA Precursors.
Biomed Res Int. 2017;2017:6783010. doi: 10.1155/2017/6783010. Epub 2017 Feb 9.
10
Genetic algorithm-based efficient feature selection for classification of pre-miRNAs.
Genet Mol Res. 2011 Apr 12;10(2):588-603. doi: 10.4238/vol10-2gmr969.

引用本文的文献

2
In Silico Identification and Functional Characterization of Conserved miRNAs in the Genome of .
Bioinform Biol Insights. 2021 Jun 27;15:11779322211027665. doi: 10.1177/11779322211027665. eCollection 2021.
4
Computational identification and characterization of miRNAs and their target genes from five cyprinidae fishes.
Saudi J Biol Sci. 2017 Sep;24(6):1126-1135. doi: 10.1016/j.sjbs.2015.05.007. Epub 2015 May 13.
5
Prediction and characterization of microRNAs from eleven fish species by computational methods.
Saudi J Biol Sci. 2015 Jul;22(4):374-81. doi: 10.1016/j.sjbs.2014.10.005. Epub 2014 Oct 23.
6
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Osong Public Health Res Perspect. 2014 Oct;5(5):279-85. doi: 10.1016/j.phrp.2014.08.004. Epub 2014 Sep 4.
7
RAmiRNA: Software suite for generation of SVMbased prediction models of mature miRNAs.
Bioinformation. 2012;8(12):581-5. doi: 10.6026/97320630008581. Epub 2012 Jun 28.
8
Computational identification and characteristics of novel microRNAs from the silkworm (Bombyx mori L.).
Mol Biol Rep. 2010 Oct;37(7):3171-6. doi: 10.1007/s11033-009-9897-4. Epub 2009 Oct 13.
9
Computational identification of potential molecular interactions in Arabidopsis.
Plant Physiol. 2009 Sep;151(1):34-46. doi: 10.1104/pp.109.141317. Epub 2009 Jul 10.

本文引用的文献

1
MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W339-44. doi: 10.1093/nar/gkm368. Epub 2007 Jun 6.
2
Computational identification of microRNAs and their targets.
Comput Biol Chem. 2006 Dec;30(6):395-407. doi: 10.1016/j.compbiolchem.2006.08.006. Epub 2006 Nov 22.
3
MicroRNAs and their regulatory roles in animals and plants.
J Cell Physiol. 2007 Feb;210(2):279-89. doi: 10.1002/jcp.20869.
4
Computational prediction of microRNAs encoded in viral and other genomes.
J Biomed Biotechnol. 2006;2006(4):95270. doi: 10.1155/JBB/2006/95270.
5
miRBase: microRNA sequences, targets and gene nomenclature.
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4. doi: 10.1093/nar/gkj112.
7
Identification of clustered microRNAs using an ab initio prediction method.
BMC Bioinformatics. 2005 Nov 7;6:267. doi: 10.1186/1471-2105-6-267.
8
MicroRNA identification based on sequence and structure alignment.
Bioinformatics. 2005 Sep 15;21(18):3610-4. doi: 10.1093/bioinformatics/bti562. Epub 2005 Jun 30.
9
Prediction of siRNA functionality using generalized string kernel and support vector machine.
FEBS Lett. 2005 May 23;579(13):2878-82. doi: 10.1016/j.febslet.2005.04.045.
10
A computational view of microRNAs and their targets.
Drug Discov Today. 2005 Apr 15;10(8):595-601. doi: 10.1016/S1359-6446(05)03399-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验