基于马尔可夫链的贝叶斯分类器预测蛋白质结构域的小分子结合特性。

Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains.

机构信息

Theoretical Bioinformatics Department, German Cancer Research Center, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.

出版信息

Comput Biol Chem. 2009 Dec;33(6):457-60. doi: 10.1016/j.compbiolchem.2009.09.005. Epub 2009 Oct 9.

DOI:10.1016/j.compbiolchem.2009.09.005

PMID:19892602

Abstract

Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.

摘要

能够帮助研究生物学家和制药公司从蛋白质序列预测其生物学功能的准确计算方法非常重要。一种假设蛋白质功能的方法是预测蛋白质与其他分子之间的相互作用。在这项工作中，我们提出了一种机器学习方法，该方法使用域的原始序列来预测其与小分子相互作用的倾向。通过根据其组成域的小分子结合能力来编纂 Pfam 数据库，我们构建了一个小分子结合和非结合域的数据集。然后，将该数据集用作训练集来学习贝叶斯分类器，该分类器应区分每类的成员。这两个类别的域序列都用马尔可夫链进行建模。在 Jack-knife 测试中，我们的分类程序分别对结合类和非结合类的预测准确率达到了 77.2%和 66.7%。我们通过使用它来识别以前未知的小分子结合结构域来证明我们的分类器的适用性。我们的预测可作为补充材料提供，可为药物发现专家提供非常有用的信息。鉴于小分子在生物过程中无处不在且至关重要的作用，我们的方法对于识别完整蛋白质组中具有药物相关性的成分非常重要。软件可根据要求向作者索取。

相似文献

Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains.基于马尔可夫链的贝叶斯分类器预测蛋白质结构域的小分子结合特性。

Comput Biol Chem. 2009 Dec;33(6):457-60. doi: 10.1016/j.compbiolchem.2009.09.005. Epub 2009 Oct 9.

Domain-based small molecule binding site annotation.基于结构域的小分子结合位点注释。

BMC Bioinformatics. 2006 Mar 17;7:152. doi: 10.1186/1471-2105-7-152.

Accurate domain identification with structure-anchored hidden Markov models, saHMMs.基于结构锚定隐马尔可夫模型（saHMMs）的精确领域识别。

Proteins. 2009 Aug 1;76(2):343-52. doi: 10.1002/prot.22349.

Sequence-based prediction of protein interaction sites with an integrative method.基于序列的蛋白质相互作用位点的综合预测方法。

Bioinformatics. 2009 Mar 1;25(5):585-91. doi: 10.1093/bioinformatics/btp039. Epub 2009 Jan 19.

Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。

Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.

Prediction of protein-RNA binding sites by a random forest method with combined features.基于组合特征的随机森林方法预测蛋白质-RNA 结合位点。

Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.

AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP：使用独特的模式-类别映射自动预测SCOP分类

Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.

Computational prediction of protein-protein interactions.蛋白质-蛋白质相互作用的计算预测

Methods Mol Biol. 2004;261:445-68. doi: 10.1385/1-59259-762-9:445.

Detection of new protein domains using co-occurrence: application to Plasmodium falciparum.利用共现检测新的蛋白质结构域：在疟原虫中的应用。

Bioinformatics. 2009 Dec 1;25(23):3077-83. doi: 10.1093/bioinformatics/btp560. Epub 2009 Sep 28.

Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines.利用支持向量机从一级结构预测核糖体RNA、RNA和DNA结合蛋白。

J Theor Biol. 2006 May 21;240(2):175-84. doi: 10.1016/j.jtbi.2005.09.018. Epub 2005 Nov 7.

引用本文的文献

Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition.通过伪氨基酸组成鉴定细菌细胞壁裂解酶

Biomed Res Int. 2016;2016:1654623. doi: 10.1155/2016/1654623. Epub 2016 Jun 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于马尔可夫链的贝叶斯分类器预测蛋白质结构域的小分子结合特性。

Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献