LIBRUS：基于序列的配体结合残基预测的机器学习和同源信息相结合。

LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction.

机构信息

Department of Computer Science, University of Minnesota, 117 Pleasant St SE, Room 464, Minneapolis, MN 55455, USA.

出版信息

Bioinformatics. 2009 Dec 1;25(23):3099-107. doi: 10.1093/bioinformatics/btp561. Epub 2009 Sep 28.

DOI:10.1093/bioinformatics/btp561

PMID:19786483

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3167698/

Abstract

MOTIVATION

Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods.

RESULTS

Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.

AVAILABILITY

Software developed for this study is available at http://bioinfo.cs.umn.edu/supplements/binf2009 along with Supplementary data on the study.

摘要

动机

识别与配体相互作用的残基对于理解蛋白质功能是有用的，并且有助于设计针对蛋白质相互作用的小分子。有几项研究表明，序列特征对于这种类型的预测非常有用，而当结构可用时，结构特征也很有用。我们开发了一种基于序列的方法，称为 LIBRUS，它结合了基于同源性的转移和使用机器学习的直接预测，并将其与以前的基于序列的工作和当前的基于结构的方法进行了比较。

结果

我们的分析表明，基于同源性的转移比使用轮廓和预测二级结构的支持向量机学习者稍微具有更强的辨别能力。我们将这两种方法结合在一种称为 LIBRUS 的方法中。在 885 个序列独立蛋白的基准测试中，它在 45%的召回率下达到了 50%的精度，ROC 曲线下的面积（ROC）为 0.83，这与以前的基于序列的努力相比有了显著的提高。在一个独立的基准测试集中，基于结构特征的当前方法 FINDSITE 达到了 0.81 的 ROC，在 50%的召回率下达到了 54%的精度，而 LIBRUS 则以较小的计算成本在 50%的召回率下达到了 39%的精度，ROC 为 0.82。当 LIBRUS 和 FINDSITE 的预测结合使用时，性能提高到超过任何一种方法，ROC 为 0.86，在 50%的召回率下达到了 59%的精度。

可用性

为这项研究开发的软件可在 http://bioinfo.cs.umn.edu/supplements/binf2009 上获得，并且还提供了关于该研究的补充数据。

相似文献

LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction.LIBRUS：基于序列的配体结合残基预测的机器学习和同源信息相结合。

Bioinformatics. 2009 Dec 1;25(23):3099-107. doi: 10.1093/bioinformatics/btp561. Epub 2009 Sep 28.

Glycosylation site prediction using ensembles of Support Vector Machine classifiers.使用支持向量机分类器集成进行糖基化位点预测。

BMC Bioinformatics. 2007 Nov 9;8:438. doi: 10.1186/1471-2105-8-438.

Structure-based prediction of protein- peptide binding regions using Random Forest.基于结构的随机森林预测蛋白肽结合区域。

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

Boosting phosphorylation site prediction with sequence feature-based machine learning.基于序列特征的机器学习提高磷酸化位点预测。

Proteins. 2020 Feb;88(2):284-291. doi: 10.1002/prot.25801. Epub 2019 Aug 22.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Prediction of lipid-binding sites based on support vector machine and position specific scoring matrix.基于支持向量机和位置特异性评分矩阵预测脂质结合位点。

Protein J. 2010 Aug;29(6):427-31. doi: 10.1007/s10930-010-9269-x.

MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.MetaPSICOV：结合协同进化方法用于精确预测蛋白质中的接触和长程氢键

Bioinformatics. 2015 Apr 1;31(7):999-1006. doi: 10.1093/bioinformatics/btu791. Epub 2014 Nov 26.

PAIRpred: partner-specific prediction of interacting residues from sequence and structure.PAIRpred：基于序列和结构的相互作用残基的特定伙伴预测。

Proteins. 2014 Jul;82(7):1142-55. doi: 10.1002/prot.24479. Epub 2013 Dec 6.

Prediction of DNA-binding residues from protein sequence information using random forests.利用随机森林从蛋白质序列信息预测DNA结合残基。

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-10-S1-S1.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

引用本文的文献

Machine Learning in Enhancing Protein Binding Sites Predictions - What Has Changed Since Then?机器学习在增强蛋白质结合位点预测中的应用——自那时起有何变化？

Comb Chem High Throughput Screen. 2024 Jun 11. doi: 10.2174/0113862073305298240524050145.

Unraveling viral drug targets: a deep learning-based approach for the identification of potential binding sites.解析病毒药物靶点：基于深度学习的潜在结合位点鉴定方法。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad459.

A multilayer dynamic perturbation analysis method for predicting ligand-protein interactions.一种用于预测配体-蛋白质相互作用的多层动态扰动分析方法。

BMC Bioinformatics. 2022 Nov 2;23(1):456. doi: 10.1186/s12859-022-04995-2.

Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions.人工智能在蛋白质-配体相互作用预测中的应用：最新进展与未来方向。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab476.

Predicting binding sites from unbound versus bound protein structures.从结合态和非结合态蛋白质结构预测结合位点。

Sci Rep. 2020 Sep 28;10(1):15856. doi: 10.1038/s41598-020-72906-7.

SmoPSI: Analysis and Prediction of Small Molecule Binding Sites Based on Protein Sequence Information.SmoPSI：基于蛋白质序列信息的小分子结合位点分析和预测。

Comput Math Methods Med. 2019 Nov 13;2019:1926156. doi: 10.1155/2019/1926156. eCollection 2019.

P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.P2Rank：基于机器学习的工具，用于从蛋白质结构中快速准确地预测配体结合位点。

J Cheminform. 2018 Aug 14;10(1):39. doi: 10.1186/s13321-018-0285-8.

Automatic generation of bioinformatics tools for predicting protein-ligand binding sites.用于预测蛋白质-配体结合位点的生物信息学工具的自动生成。

Bioinformatics. 2016 Mar 15;32(6):901-7. doi: 10.1093/bioinformatics/btv593. Epub 2015 Nov 5.

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone.LigandRFs：一种随机森林集成算法，可仅通过序列信息识别配体结合残基。

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S4. doi: 10.1186/1471-2105-15-S15-S4. Epub 2014 Dec 3.

RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.RNABindRPlus：一种结合机器学习和基于序列同源性的方法来提高蛋白质中预测的RNA结合残基可靠性的预测工具。

PLoS One. 2014 May 20;9(5):e97725. doi: 10.1371/journal.pone.0097725. eCollection 2014.

本文引用的文献

Improving homology models for protein-ligand binding sites.改进蛋白质-配体结合位点的同源模型。

Comput Syst Bioinformatics Conf. 2008;7:211-22.

Prediction of protein functional residues from sequence by probability density estimation.通过概率密度估计从序列预测蛋白质功能残基。

Bioinformatics. 2008 Mar 1;24(5):613-20. doi: 10.1093/bioinformatics/btm626. Epub 2008 Jan 2.

A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation.一种基于线程的配体结合位点预测和功能注释方法（FINDSITE）。

Proc Natl Acad Sci U S A. 2008 Jan 8;105(1):129-34. doi: 10.1073/pnas.0707684105. Epub 2007 Dec 28.

Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go.迈向通用、快速且高度准确的对接/评分方法的发展：还有很长的路要走。

Br J Pharmacol. 2008 Mar;153 Suppl 1(Suppl 1):S7-26. doi: 10.1038/sj.bjp.0707515. Epub 2007 Nov 26.

fRMSDPred: predicting local RMSD between structural fragments using sequence information.fRMSDPred：利用序列信息预测结构片段之间的局部均方根偏差。

Comput Syst Bioinformatics Conf. 2007;6:311-22.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.PFRES：利用进化信息和预测的二级结构进行蛋白质折叠分类

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

firestar--prediction of functionally important residues using structural templates and alignment reliability.Firestar——利用结构模板和比对可靠性预测功能重要残基

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W573-7. doi: 10.1093/nar/gkm297. Epub 2007 Jun 21.

Evaluation of features for catalytic residue prediction in novel folds.新型折叠中催化残基预测特征的评估。

Protein Sci. 2007 Feb;16(2):216-26. doi: 10.1110/ps.062523907. Epub 2006 Dec 22.

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties.使用支持向量机结合选定的蛋白质序列和结构特性预测催化残基。

BMC Bioinformatics. 2006 Jun 21;7:312. doi: 10.1186/1471-2105-7-312.

YASSPP: better kernels and coding schemes lead to improvements in protein secondary structure prediction.YASSPP：更好的核函数和编码方案可改善蛋白质二级结构预测。

Proteins. 2006 Aug 15;64(3):575-86. doi: 10.1002/prot.21036.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验