Suppr超能文献

敲击黑箱:机器学习评分函数的评分能力如何依赖于训练集?

Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?

机构信息

State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People's Republic of China.

University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.

出版信息

J Chem Inf Model. 2020 Mar 23;60(3):1122-1136. doi: 10.1021/acs.jcim.9b00714. Epub 2020 Mar 3.

Abstract

In recent years, protein-ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the overlap between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf machine-learning algorithms, including Bayesian Ridge Regression (BRR), Decision Tree (DT), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Linear Support Vector Regression (L-SVR), and Random Forest (RF). Model scoring functions were derived with these machine-learning algorithms on various training sets selected from over 3700 protein-ligand complexes in the PDBbind refined set (version 2016). All resulting scoring functions were then applied to the CASF-2016 test set to validate their scoring power. In our first series of trial, the size of the training set was fixed; while the overall similarity between the training set and the test set was varied systematically. In our second series of trial, the overall similarity between the training set and the test set was fixed, while the size of the training set was varied. Our results indicate that the performance of those machine-learning models are more or less dependent on the contents or the size of the training set, where the RF model demonstrates the best learning capability. In contrast, the performance of three conventional scoring functions (i.e., ChemScore, ASP, and X-Score) is basically insensitive to the use of different training sets. Therefore, one has to consider not only "hard overlap" but also "soft overlap" between the training set and the test set in order to evaluate machine-learning scoring functions. In this spirit, we have complied data sets based on the PDBbind refined set by removing redundant samples under several similarity thresholds. Scoring functions developers are encouraged to employ them as standard training sets if they want to evaluate their new models on the CASF-2016 benchmark.

摘要

近年来,通过机器学习得到的蛋白配体相互作用打分函数被反复报道优于传统的打分函数。然而,一些已发表的研究质疑机器学习打分函数的优越性能依赖于训练集和测试集之间的重叠。为了检验机器学习算法在打分函数构建中的真正能力,我们对六种现成的机器学习算法(包括贝叶斯 Ridge 回归(BRR)、决策树(DT)、K 近邻(KNN)、多层感知机(MLP)、线性支持向量回归(L-SVR)和随机森林(RF))进行了系统研究。我们使用这些机器学习算法从 PDBbind 精炼集(版本 2016)中的 3700 多个蛋白-配体复合物中选择了各种训练集来推导模型打分函数。然后,我们将所有得到的打分函数应用于 CASF-2016 测试集来验证它们的打分能力。在我们的第一个系列试验中,我们固定了训练集的大小;同时系统地改变了训练集和测试集之间的整体相似度。在我们的第二个系列试验中,我们固定了训练集和测试集之间的整体相似度,同时改变了训练集的大小。我们的结果表明,这些机器学习模型的性能或多或少取决于训练集的内容或大小,其中 RF 模型表现出最好的学习能力。相比之下,三种传统打分函数(即 ChemScore、ASP 和 X-Score)的性能基本上不受使用不同训练集的影响。因此,为了评估机器学习打分函数,人们不仅要考虑训练集和测试集之间的“硬重叠”,还要考虑“软重叠”。本着这种精神,我们根据 PDBbind 精炼集,在几个相似性阈值下,通过去除冗余样本来整理数据集。如果希望在 CASF-2016 基准上评估他们的新模型,打分函数开发者应鼓励他们使用这些数据集作为标准训练集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验