随机森林对 KECSA2 基于知识的打分函数在蛋白质诱饵检测中的优化。

Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection.

机构信息

Department of Chemistry , Michigan State University , 578 S. Shaw Lane , East Lansing , Michigan 48824 , United States.

Institute for Cyber Enabled Research , Michigan State University , 567 Wilson Road , East Lansing , Michigan 48824 , United States.

出版信息

J Chem Inf Model. 2019 May 28;59(5):1919-1929. doi: 10.1021/acs.jcim.8b00734. Epub 2019 Feb 20.

DOI:10.1021/acs.jcim.8b00734

PMID:30726079

Abstract

Knowledge-based potentials generally perform better than physics-based scoring functions in detecting the native structure from a collection of decoy protein structures. Through the use of a reference state, the pure interactions between atom/residue pairs can be obtained through the removal of contributions from ideal-gas state potentials. However, it is a challenge for conventional knowledge-based potentials to assign different importance factors to different atom/residue pairs. In this work, via the use of the "comparison" concept, Random Forest (RF) models were successfully generated using unbalanced data sets that assign different importance factors to atom pair potentials to enhance their ability to identify native proteins from decoy proteins. Individual and combined data sets consisting of 12 decoy sets were used to test the performance of the RF models. We find that RF models increase the recognition of native structures without affecting their ability to identify the best decoy structures. We also created models using scrambled atom types, which create physically unrealistic probability functions in order to test the ability of the RF algorithm to create useful models based on inputted scrambled probability functions. From this test, we find that we are unable to create models that are of similar quality relative to the unscrambled probability functions. Next, we created uniform probability functions where the peak positions are the same as the original, but each interaction has the same peak height. Using these uniform potentials, we were able to recover models as good as the ones using the full potentials suggesting all that is important in these models are the experimental peak positions. The KECSA2 potential along with all codes used in this work are available at https://github.com/JunPei000/protein_folding-decoy-set .

摘要

基于知识的势能通常比基于物理的评分函数在从一系列诱饵蛋白质结构中检测天然结构方面表现更好。通过使用参考状态，可以通过去除理想气体状态势能的贡献来获得原子/残基对之间的纯相互作用。然而，对于传统的基于知识的势能来说，为不同的原子/残基对分配不同的重要因素是一个挑战。在这项工作中，通过使用“比较”的概念，成功地使用不平衡数据集生成了随机森林（RF）模型，这些数据集为原子对势能分配不同的重要因素，以增强其从诱饵蛋白中识别天然蛋白的能力。使用由 12 个诱饵集组成的单个和组合数据集来测试 RF 模型的性能。我们发现，RF 模型提高了对天然结构的识别能力，同时又不影响其识别最佳诱饵结构的能力。我们还使用打乱的原子类型创建了模型，这些模型创建了物理上不现实的概率函数，以测试 RF 算法基于输入的打乱概率函数创建有用模型的能力。通过这项测试，我们发现我们无法创建与未打乱概率函数质量相当的模型。接下来，我们创建了均匀概率函数，其中峰位与原始概率函数相同，但每个相互作用的峰高相同。使用这些均匀的势能，我们能够恢复与使用完整势能一样好的模型，这表明这些模型中重要的是实验峰位。KECSA2 势能以及本工作中使用的所有代码都可在 https://github.com/JunPei000/protein_folding-decoy-set 上获得。

相似文献

Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection.随机森林对 KECSA2 基于知识的打分函数在蛋白质诱饵检测中的优化。

J Chem Inf Model. 2019 May 28;59(5):1919-1929. doi: 10.1021/acs.jcim.8b00734. Epub 2019 Feb 20.

Random Forest Refinement of Pairwise Potentials for Protein-Ligand Decoy Detection.随机森林算法对蛋白质-配体虚拟筛选中对作用能的改进。

J Chem Inf Model. 2019 Jul 22;59(7):3305-3315. doi: 10.1021/acs.jcim.9b00356. Epub 2019 Jul 2.

Pair Potentials as Machine Learning Features.作为机器学习特征的对偶势

J Chem Theory Comput. 2020 Aug 11;16(8):5385-5400. doi: 10.1021/acs.jctc.9b01246. Epub 2020 Jul 6.

FFENCODER-PL: Pair Wise Energy Descriptors for Protein-Ligand Pose Selection.FFENCODER-PL：用于蛋白质-配体构象选择的成对能量描述符。

J Chem Theory Comput. 2021 Oct 12;17(10):6647-6657. doi: 10.1021/acs.jctc.1c00503. Epub 2021 Sep 23.

Refinement of pairwise potentials via logistic regression to score protein-protein interactions.通过逻辑回归对成对势能进行细化，以对蛋白质-蛋白质相互作用进行评分。

Proteins. 2020 Dec;88(12):1559-1568. doi: 10.1002/prot.25973. Epub 2020 Jul 30.

Novel nonlinear knowledge-based mean force potentials based on machine learning.基于机器学习的新型非线性基于知识的平均力势。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):476-86. doi: 10.1109/TCBB.2010.86.

An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state.一种基于距离缩放的理想气体参考态的、用于折叠和结合的精确到残基水平的平均力对势。

Protein Sci. 2004 Feb;13(2):400-11. doi: 10.1110/ps.03348304.

Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the Surface Generalized Born solvent model.基于OPLS全原子力场和表面广义玻恩溶剂模型，用有效自由能估计器区分蛋白质的天然构象与诱饵构象。

Proteins. 2002 Aug 1;48(2):404-22. doi: 10.1002/prot.10171.

Improved protein structure selection using decoy-dependent discriminatory functions.使用诱饵依赖型判别函数改进蛋白质结构选择

BMC Struct Biol. 2004 Jun 18;4:8. doi: 10.1186/1472-6807-4-8.

Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction.距离缩放的有限理想气体参考态改善了用于结构选择和稳定性预测的基于结构的平均力势。

Protein Sci. 2002 Nov;11(11):2714-26. doi: 10.1110/ps.0217002.

引用本文的文献

A simple neural network implementation of generalized solvation free energy for assessment of protein structural models.一种用于评估蛋白质结构模型的广义溶剂化自由能的简单神经网络实现方法。

RSC Adv. 2019 Nov 6;9(62):36227-36233. doi: 10.1039/c9ra05168f. eCollection 2019 Nov 4.

Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models.基于机器学习的定量构效关系模型的综合策略。

iScience. 2021 Aug 28;24(9):103052. doi: 10.1016/j.isci.2021.103052. eCollection 2021 Sep 24.

Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.用于计算配体-生物聚合物亲和力的非参数化学描述符与机器学习打分函数。

J Comput Aided Mol Des. 2019 Nov;33(11):943-953. doi: 10.1007/s10822-019-00248-2. Epub 2019 Nov 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

随机森林对 KECSA2 基于知识的打分函数在蛋白质诱饵检测中的优化。

Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献