Suppr超能文献

利用全局和内在折叠度量对来自基因组假发夹结构的前体微小RNA进行从头支持向量机分类。

De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures.

作者信息

Ng Kwang Loong Stanley, Mishra Santosh K

机构信息

Bioinformatics Institute, Singapore.

出版信息

Bioinformatics. 2007 Jun 1;23(11):1321-30. doi: 10.1093/bioinformatics/btm026. Epub 2007 Jan 31.

Abstract

MOTIVATION

MicroRNAs (miRNAs) are small ncRNAs participating in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Critically associated with the miRNAs biogenesis, the hairpin structure is a necessary feature for the computational classification of novel precursor miRNAs (pre-miRs). Though many of the abundant genomic inverted repeats (pseudo hairpins) can be filtered computationally, novel species-specific pre-miRs are likely to remain elusive.

RESULTS

miPred is a de novo Support Vector Machine (SVM) classifier for identifying pre-miRs without relying on phylogenetic conservation. To achieve significantly higher sensitivity and specificity than existing (quasi) de novo predictors, it employs a Gaussian Radial Basis Function kernel (RBF) as a similarity measure for 29 global and intrinsic hairpin folding attributes. They characterize a pre-miR at the dinucleotide sequence, hairpin folding, non-linear statistical thermodynamics and topological levels. Trained on 200 human pre-miRs and 400 pseudo hairpins, miPred achieves 93.50% (5-fold cross-validation accuracy) and 0.9833 (ROC score). Tested on the remaining 123 human pre-miRs and 246 pseudo hairpins, it reports 84.55% (sensitivity), 97.97% (specificity) and 93.50% (accuracy). Validated onto 1918 pre-miRs across 40 non-human species and 3836 pseudo hairpins, it yields 87.65% (92.08%), 97.75% (97.42%) and 94.38% (95.64%) for the mean (overall) sensitivity, specificity and accuracy. Notably, A.mellifera, A.geoffroyi, C.familiaris, E.Barr, H. Simplex virus, H.cytomegalovirus, O.aries, P.patens, R.lymphocryptovirus, Simian virus and Z.mays are unambiguously classified with 100.00% (sensitivity) and >93.75% (specificity).

AVAILABILITY

Data sets, raw statistical results and source codes are available at http://web.bii.a-star.edu.sg/~stanley/Publications

摘要

动机

微小RNA(miRNA)是一类小的非编码RNA,通过转录后基因调控途径参与多种细胞和生理过程。发夹结构与miRNA的生物合成密切相关,是对新型前体miRNA(pre-miR)进行计算分类的必要特征。尽管许多丰富的基因组反向重复序列(假发夹)可以通过计算进行过滤,但新的物种特异性pre-miR可能仍然难以捉摸。

结果

miPred是一种从头开始的支持向量机(SVM)分类器,用于识别pre-miR,而不依赖于系统发育保守性。为了实现比现有的(准)从头预测器更高的灵敏度和特异性,它采用高斯径向基函数核(RBF)作为29种全局和内在发夹折叠属性的相似性度量。它们在二核苷酸序列、发夹折叠、非线性统计热力学和拓扑水平上表征pre-miR。在200个人类pre-miR和400个假发夹上进行训练,miPred的准确率达到93.50%(5折交叉验证准确率),ROC分数为0.9833。在其余123个人类pre-miR和246个假发夹上进行测试,其灵敏度为84.55%,特异性为97.97%,准确率为93.50%。在40种非人类物种的1918个pre-miR和3836个假发夹上进行验证,其平均(总体)灵敏度、特异性和准确率分别为87.65%(92.08%)、97.75%(97.42%)和94.38%(95.64%)。值得注意的是,意大利蜜蜂、白额卷尾猴、家犬、E.Barr病毒、单纯疱疹病毒、巨细胞病毒、绵羊、小立碗藓、淋巴隐病毒、猿猴病毒和玉米均被明确分类,灵敏度为100.00%,特异性>93.75%。

可用性

数据集、原始统计结果和源代码可在http://web.bii.a-star.edu.sg/~stanley/Publications获取

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验