Suppr超能文献

基于支持向量机和综合特征分析的 DNA 结合蛋白改进序列预测协议。

An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis.

机构信息

Shanghai Key Laboratory of New Drug Design, State Key Laboratory of Bioreactor Engineering, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.

出版信息

BMC Bioinformatics. 2013 Mar 9;14:90. doi: 10.1186/1471-2105-14-90.

Abstract

BACKGROUND

DNA-binding proteins (DNA-BPs) play a pivotal role in both eukaryotic and prokaryotic proteomes. There have been several computational methods proposed in the literature to deal with the DNA-BPs, many informative features and properties were used and proved to have significant impact on this problem. However the ultimate goal of Bioinformatics is to be able to predict the DNA-BPs directly from primary sequence.

RESULTS

In this work, the focus is how to transform these informative features into uniform numeric representation appropriately and improve the prediction accuracy of our SVM-based classifier for DNA-BPs. A systematic representation of some selected features known to perform well is investigated here. Firstly, four kinds of protein properties are obtained and used to describe the protein sequence. Secondly, three different feature transformation methods (OCTD, AC and SAA) are adopted to obtain numeric feature vectors from three main levels: Global, Nonlocal and Local of protein sequence and their performances are exhaustively investigated. At last, the mRMR-IFS feature selection method and ensemble learning approach are utilized to determine the best prediction model. Besides, the optimal features selected by mRMR-IFS are illustrated based on the observed results which may provide useful insights for revealing the mechanisms of protein-DNA interactions. For five-fold cross-validation over the DNAdset and DNAaset, we obtained an overall accuracy of 0.940 and 0.811, MCC of 0.881 and 0.614 respectively.

CONCLUSIONS

The good results suggest that it can efficiently develop an entirely sequence-based protocol that transforms and integrates informative features from different scales used by SVM to predict DNA-BPs accurately. Moreover, a novel systematic framework for sequence descriptor-based protein function prediction is proposed here.

摘要

背景

DNA 结合蛋白(DNA-BPs)在真核生物和原核生物蛋白质组中都起着至关重要的作用。文献中已经提出了几种计算方法来处理 DNA-BPs,这些方法使用了许多有意义的特征和属性,并被证明对这个问题有重大影响。然而,生物信息学的最终目标是能够直接从原始序列预测 DNA-BPs。

结果

在这项工作中,重点是如何将这些有意义的特征适当地转化为统一的数字表示,并提高我们基于 SVM 的 DNA-BPs 分类器的预测准确性。本文研究了一些表现良好的特征的系统表示。首先,获取四种蛋白质特性并用于描述蛋白质序列。其次,采用三种不同的特征转换方法(OCTD、AC 和 SAA)从蛋白质序列的全局、非局部和局部三个主要层次获取数字特征向量,并对其性能进行了详尽的研究。最后,利用 mRMR-IFS 特征选择方法和集成学习方法来确定最佳预测模型。此外,基于观察结果,说明了 mRMR-IFS 选择的最佳特征,这可能为揭示蛋白质-DNA 相互作用的机制提供有用的见解。对于 DNAdset 和 DNAaset 的五重交叉验证,我们分别获得了 0.940 和 0.811 的总体准确率,0.881 和 0.614 的 MCC。

结论

这些良好的结果表明,它可以有效地开发一种完全基于序列的协议,该协议可以准确地预测 DNA-BPs,同时转换和整合 SVM 使用的来自不同尺度的有意义的特征。此外,本文还提出了一种基于序列描述符的蛋白质功能预测的新系统框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a38b/3602657/ba77c6cb91bb/1471-2105-14-90-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验