基于随机森林方法的家族特异性蛋白质-配体复合物亲和力预测的比较研究。

A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach.

作者信息

Wang Yu, Guo Yanzhi, Kuang Qifan, Pu Xuemei, Ji Yue, Zhang Zhihang, Li Menglong

机构信息

College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China.

出版信息

J Comput Aided Mol Des. 2015 Apr;29(4):349-60. doi: 10.1007/s10822-014-9827-y. Epub 2014 Dec 20.

DOI:10.1007/s10822-014-9827-y

PMID:25527073

Abstract

The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.

摘要

评估配体与靶蛋白之间的结合亲和力在药物发现和设计过程中起着至关重要的作用。作为广泛使用的评分方法的替代方法，机器学习方法也已被提出用于快速预测结合亲和力并取得了有前景的结果，但其中大多数是作为通用模型开发的，而忽略了不同蛋白质家族的特定功能，因为来自不同功能家族的蛋白质总是具有不同的结构和物理化学特征。在本研究中，我们提出了一种基于涵盖蛋白质序列、结合口袋、配体结构和分子间相互作用的综合特征集来预测蛋白质-配体结合亲和力的随机森林方法。对不同蛋白质家族数据集分别进行了特征处理和压缩，这表明不同特征对不同模型有贡献，因此每个蛋白质家族需要单独表示。分别为HIV-1蛋白酶、胰蛋白酶和碳酸酐酶这三个重要的蛋白质靶标家族构建了三个家族特异性模型。作为比较，还构建了两个包含不同蛋白质家族的通用模型。评估结果表明，家族特异性数据集上的模型性能优于通用数据集上的模型，HIV-1蛋白酶、胰蛋白酶和碳酸酐酶测试集上的皮尔逊和斯皮尔曼相关系数（Rp和Rs）分别为0.740、0.874、0.735和0.697、0.853、0.723。与其他方法的比较进一步表明，为每个蛋白质家族单独表示和构建模型是预测特定蛋白质家族亲和力的更合理方法。

相似文献

A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach.基于随机森林方法的家族特异性蛋白质-配体复合物亲和力预测的比较研究。

J Comput Aided Mol Des. 2015 Apr;29(4):349-60. doi: 10.1007/s10822-014-9827-y. Epub 2014 Dec 20.

Enhance the performance of current scoring functions with the aid of 3D protein-ligand interaction fingerprints.借助三维蛋白质-配体相互作用指纹图谱提高当前评分函数的性能。

BMC Bioinformatics. 2017 Jul 18;18(1):343. doi: 10.1186/s12859-017-1750-5.

Binding affinity prediction for protein-ligand complexes based on β contacts and B factor.基于β接触和 B 因子的蛋白质-配体复合物结合亲和力预测。

J Chem Inf Model. 2013 Nov 25;53(11):3076-85. doi: 10.1021/ci400450h. Epub 2013 Nov 5.

Machine learning in computational docking.计算对接中的机器学习。

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction.常规与基于机器学习打分函数对蛋白质-配体结合亲和力预测的排序准确性比较评估。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1301-13. doi: 10.1109/TCBB.2012.36.

Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction.基于相似性的配体对接和结合亲和力预测的非线性评分函数。

J Chem Inf Model. 2013 Nov 25;53(11):3097-112. doi: 10.1021/ci400510e. Epub 2013 Nov 11.

An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes.使用包含800个蛋白质-配体复合物的PDBbind精制集对14种评分函数进行的广泛测试。

J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2114-25. doi: 10.1021/ci049733j.

A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction.传统评分函数与机器学习评分函数在蛋白质-配体结合亲和力预测中的预测准确性比较评估

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):335-47. doi: 10.1109/TCBB.2014.2351824.

Designing of multi-targeted molecules using combination of molecular screening and in silico drug cardiotoxicity prediction approaches.设计多靶点分子：结合分子筛选和计算机药物心脏毒性预测方法。

J Mol Graph Model. 2014 May;50:16-34. doi: 10.1016/j.jmgm.2014.02.007. Epub 2014 Mar 6.

A novel method for protein-ligand binding affinity prediction and the related descriptors exploration.一种用于蛋白质-配体结合亲和力预测及相关描述符探索的新方法。

J Comput Chem. 2009 Apr 30;30(6):900-9. doi: 10.1002/jcc.21078.

引用本文的文献

Edge-enhanced interaction graph network for protein-ligand binding affinity prediction.用于蛋白质-配体结合亲和力预测的边缘增强相互作用图网络。

PLoS One. 2025 Apr 8;20(4):e0320465. doi: 10.1371/journal.pone.0320465. eCollection 2025.

Recent Development, Applications, and Patents of Artificial Intelligence in Drug Design and Development.人工智能在药物设计与开发中的最新进展、应用及专利

Curr Drug Discov Technol. 2025 Feb 10. doi: 10.2174/0115701638364199250123062248.

Predicting Calcein Release from Ultrasound-Targeted Liposomes: A Comparative Analysis of Random Forest and Support Vector Machine.超声靶向脂质体中 calcein 释放的预测：随机森林和支持向量机的比较分析。

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338241296725. doi: 10.1177/15330338241296725.

Trends of Artificial Intelligence (AI) Use in Drug Targets, Discovery and Development: Current Status and Future Perspectives.人工智能在药物靶点、发现与开发中的应用趋势：现状与未来展望

Curr Drug Targets. 2025;26(4):221-242. doi: 10.2174/0113894501322734241008163304.

The Artificial Intelligence-Powered New Era in Pharmaceutical Research and Development: A Review.人工智能驱动的药物研发新时代：综述。

AAPS PharmSciTech. 2024 Aug 15;25(6):188. doi: 10.1208/s12249-024-02901-y.

MD-Ligand-Receptor: A High-Performance Computing Tool for Characterizing Ligand-Receptor Binding Interactions in Molecular Dynamics Trajectories.MD-Ligand-Receptor：一种用于在分子动力学轨迹中描述配体-受体结合相互作用的高性能计算工具。

Int J Mol Sci. 2023 Jul 19;24(14):11671. doi: 10.3390/ijms241411671.

Protein-Specific Prediction of RNA-Binding Sites Based on Information Entropy.基于信息熵的蛋白质特异性 RNA 结合位点预测。

Comput Intell Neurosci. 2022 Oct 3;2022:8626628. doi: 10.1155/2022/8626628. eCollection 2022.

A machine learning approach towards the prediction of protein-ligand binding affinity based on fundamental molecular properties.一种基于基本分子特性预测蛋白质-配体结合亲和力的机器学习方法。

RSC Adv. 2018 Mar 28;8(22):12127-12137. doi: 10.1039/c8ra00003d. eCollection 2018 Mar 26.

Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions.人工智能在蛋白质-配体相互作用预测中的应用：最新进展与未来方向。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab476.

Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents.人工智能驱动的抗神经退行性治疗药物识别与开发的最新趋势。

Mol Divers. 2021 Aug;25(3):1517-1539. doi: 10.1007/s11030-021-10274-8. Epub 2021 Jul 19.

本文引用的文献

Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.随机森林替代多元线性回归可提高评分函数结合亲和力预测的准确性：以 Cyscore 为例。

BMC Bioinformatics. 2014 Aug 27;15(1):291. doi: 10.1186/1471-2105-15-291.

Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?对蛋白质-配体复合物进行更精确的化学描述是否能更准确地预测结合亲和力？

J Chem Inf Model. 2014 Mar 24;54(3):944-55. doi: 10.1021/ci500091r. Epub 2014 Feb 20.

Reconstruction and analysis of human heart-specific metabolic network based on transcriptome and proteome data.基于转录组和蛋白质组数据重建和分析人类心脏特异性代谢网络。

Biochem Biophys Res Commun. 2011 Nov 25;415(3):450-4. doi: 10.1016/j.bbrc.2011.10.090. Epub 2011 Oct 25.

QSAR studies on HIV-1 protease inhibitors using non-linearly transformed descriptors.使用非线性变换描述符对HIV-1蛋白酶抑制剂进行定量构效关系研究。

Curr Comput Aided Drug Des. 2012 Mar;8(1):10-49. doi: 10.2174/157340912799218534.

CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions.2010 年的 CSAR 基准测试练习：所有提交的评分函数的综合评估。

J Chem Inf Model. 2011 Sep 26;51(9):2115-31. doi: 10.1021/ci200269q. Epub 2011 Aug 29.

Knowledge-based scoring functions in drug design. 1. Developing a target-specific method for kinase-ligand interactions.基于知识的药物设计评分函数。1. 开发针对激酶-配体相互作用的靶标特异性方法。

J Chem Inf Model. 2010 Aug 23;50(8):1378-86. doi: 10.1021/ci100182c.

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.一种基于机器学习的蛋白质 - 配体结合亲和力预测方法及其在分子对接中的应用。

Bioinformatics. 2010 May 1;26(9):1169-75. doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

Application of random forest approach to QSAR prediction of aquatic toxicity.随机森林方法在定量结构-活性关系预测水生毒性中的应用。

J Chem Inf Model. 2009 Nov;49(11):2481-8. doi: 10.1021/ci900203n.

Comparative assessment of scoring functions on a diverse test set.在多样化测试集上对评分函数的比较评估。

J Chem Inf Model. 2009 Apr;49(4):1079-93. doi: 10.1021/ci9000053.

A novel method for protein-ligand binding affinity prediction and the related descriptors exploration.一种用于蛋白质-配体结合亲和力预测及相关描述符探索的新方法。

J Comput Chem. 2009 Apr 30;30(6):900-9. doi: 10.1002/jcc.21078.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于随机森林方法的家族特异性蛋白质-配体复合物亲和力预测的比较研究。

A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献