使用代价敏感拉普拉斯支持向量机基于序列预测蛋白质中的微小RNA结合残基。

Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines.

作者信息

Wu Jian-Sheng, Zhou Zhi-Hua

机构信息

Nanjing University and Nanjing University of Posts and Telecommunications, Nanjing.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):752-9. doi: 10.1109/TCBB.2013.75.

DOI:10.1109/TCBB.2013.75

PMID:24091407

Abstract

The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 ± 2.55% and an AUC value of 0.805 ± 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.

摘要

识别蛋白质中的微小RNA（miRNA）结合残基有助于理解miRNA如何使靶基因沉默。由于缺乏训练示例，使用现有的计算方法预测蛋白质中的miRNA结合残基具有一定难度。为了解决这个问题，可以利用未标记数据来帮助构建计算模型。半监督学习涉及除了自动利用标记数据外，还利用未标记数据来提高学习性能的方法，其中假定没有人工干预。此外，miRNA结合蛋白中结合残基的数量几乎总是比非结合残基少得多，而代价敏感学习被认为是解决类不平衡问题的一个好方法。在这项工作中，提出了一种新的模型，该模型使用具有混合特征的拉普拉斯支持向量机（CS-LapSVM）的代价敏感扩展，从序列中识别蛋白质中的miRNA结合残基。混合特征由氨基酸序列的进化信息（位置特异性得分矩阵）、关于三种生化特性的保守信息（HKM）以及蛋白质-miRNA复合物结构中的相互作用倾向组成。CS-LapSVM的F1分数为26.23±2.55%，AUC值为0.805±0.020，性能良好，优于现有的RNA结合残基识别方法。构建了一个名为SARS的网络服务器，可供学术使用。

相似文献

Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines.使用代价敏感拉普拉斯支持向量机基于序列预测蛋白质中的微小RNA结合残基。

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):752-9. doi: 10.1109/TCBB.2013.75.

Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information.基于序列信息，使用拉普拉斯支持向量机预测蛋白质中的微小RNA结合残基。

J Bioinform Comput Biol. 2018 Jun;16(3):1840009. doi: 10.1142/S0219720018400097. Epub 2018 Feb 4.

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.PCVMZM：使用概率分类向量机模型结合泽尼克矩描述符从蛋白质序列预测蛋白质-蛋白质相互作用

Int J Mol Sci. 2017 May 11;18(5):1029. doi: 10.3390/ijms18051029.

Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information.利用进化信息从氨基酸序列对蛋白质自身相互作用进行稳健且准确的预测。

Mol Biosyst. 2016 Nov 15;12(12):3702-3710. doi: 10.1039/c6mb00599c.

Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines.使用支持向量机基于序列预测蛋白质-碳水化合物结合位点

J Chem Inf Model. 2016 Oct 24;56(10):2115-2122. doi: 10.1021/acs.jcim.6b00320. Epub 2016 Sep 22.

SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。

J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.

DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches.DNABind：一种基于机器学习和模板的混合算法，用于预测基于结构的 DNA 结合残基。

Proteins. 2013 Nov;81(11):1885-99. doi: 10.1002/prot.24330. Epub 2013 Aug 16.

TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning.目标 DBP：基于序列的多视图特征学习的准确 DNA 结合蛋白预测。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1419-1429. doi: 10.1109/TCBB.2019.2893634. Epub 2019 Jan 18.

New support vector machine-based method for microRNA target prediction.基于支持向量机的新型微小RNA靶标预测方法。

Genet Mol Res. 2014 Jun 9;13(2):4165-76. doi: 10.4238/2014.June.9.3.

引用本文的文献

Time-Shift Multiscale Fuzzy Entropy and Laplacian Support Vector Machine Based Rolling Bearing Fault Diagnosis.基于时移多尺度模糊熵和拉普拉斯支持向量机的滚动轴承故障诊断

Entropy (Basel). 2018 Aug 13;20(8):602. doi: 10.3390/e20080602.

Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter.基于能量过滤的自适应深度递归神经网络预测 RNA 二级结构

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):684. doi: 10.1186/s12859-019-3258-7.

Kinase Identification with Supervised Laplacian Regularized Least Squares.基于监督拉普拉斯正则化最小二乘法的激酶识别

PLoS One. 2015 Oct 8;10(10):e0139676. doi: 10.1371/journal.pone.0139676. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用代价敏感拉普拉斯支持向量机基于序列预测蛋白质中的微小RNA结合残基。

Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献