Suppr超能文献

使用代价敏感拉普拉斯支持向量机基于序列预测蛋白质中的微小RNA结合残基。

Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines.

作者信息

Wu Jian-Sheng, Zhou Zhi-Hua

机构信息

Nanjing University and Nanjing University of Posts and Telecommunications, Nanjing.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):752-9. doi: 10.1109/TCBB.2013.75.

Abstract

The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 ± 2.55% and an AUC value of 0.805 ± 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.

摘要

识别蛋白质中的微小RNA(miRNA)结合残基有助于理解miRNA如何使靶基因沉默。由于缺乏训练示例,使用现有的计算方法预测蛋白质中的miRNA结合残基具有一定难度。为了解决这个问题,可以利用未标记数据来帮助构建计算模型。半监督学习涉及除了自动利用标记数据外,还利用未标记数据来提高学习性能的方法,其中假定没有人工干预。此外,miRNA结合蛋白中结合残基的数量几乎总是比非结合残基少得多,而代价敏感学习被认为是解决类不平衡问题的一个好方法。在这项工作中,提出了一种新的模型,该模型使用具有混合特征的拉普拉斯支持向量机(CS-LapSVM)的代价敏感扩展,从序列中识别蛋白质中的miRNA结合残基。混合特征由氨基酸序列的进化信息(位置特异性得分矩阵)、关于三种生化特性的保守信息(HKM)以及蛋白质-miRNA复合物结构中的相互作用倾向组成。CS-LapSVM的F1分数为26.23±2.55%,AUC值为0.805±0.020,性能良好,优于现有的RNA结合残基识别方法。构建了一个名为SARS的网络服务器,可供学术使用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验