Liu Huisi, Su Minyi, Lin Hai-Xia, Wang Renxiao, Li Yan
Department of Chemistry, College of Sciences, Shanghai University, 99 Shangda Road, Shanghai 200444, People's Republic of China.
State Key Laboratory of Bioorganic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, People's Republic of China.
ACS Omega. 2022 May 26;7(22):18985-18996. doi: 10.1021/acsomega.2c02156. eCollection 2022 Jun 7.
Protein-ligand binding affinity reflects the equilibrium thermodynamics of the protein-ligand binding process. Binding/unbinding kinetics is the other side of the coin. Computational models for interpreting the quantitative structure-kinetics relationship (QSKR) aim at predicting protein-ligand binding/unbinding kinetics based on protein structure, ligand structure, or their complex structure, which in principle can provide a more rational basis for structure-based drug design. Thus far, most of the public data sets used for deriving such QSKR models are rather limited in sample size and structural diversity. To tackle this problem, we have compiled a set of 680 protein-ligand complexes with experimental dissociation rate constants ( ), which were mainly curated from the references accumulated for updating our PDBbind database. Three-dimensional structure of each protein-ligand complex in this data set was either retrieved from the Protein Data Bank or carefully modeled based on a proper template. The entire data set covers 155 types of protein, with their dissociation kinetic constants ( ) spanning nearly 10 orders of magnitude. To the best of our knowledge, this data set is the largest of its kind reported publicly. Utilizing this data set, we derived a random forest (RF) model based on protein-ligand atom pair descriptors for predicting values. We also demonstrated that utilizing modeled structures as additional training samples will benefit the model performance. The RF model with mixed structures can serve as a baseline for testifying other more sophisticated QSKR models. The whole data set, namely, , is available for free download at our PDBbind-CN web site (http://www.pdbbind.org.cn/download.php).
蛋白质-配体结合亲和力反映了蛋白质-配体结合过程的平衡热力学。结合/解离动力学则是问题的另一方面。用于解释定量结构-动力学关系(QSKR)的计算模型旨在基于蛋白质结构、配体结构或它们的复合物结构来预测蛋白质-配体的结合/解离动力学,原则上可为基于结构的药物设计提供更合理的基础。到目前为止,用于推导此类QSKR模型的大多数公共数据集在样本量和结构多样性方面都相当有限。为了解决这个问题,我们整理了一组680个具有实验解离速率常数( )的蛋白质-配体复合物,这些复合物主要是从为更新我们的PDBbind数据库而积累的参考文献中挑选出来的。该数据集中每个蛋白质-配体复合物的三维结构要么从蛋白质数据库中检索,要么基于合适的模板进行精心建模。整个数据集涵盖155种蛋白质,其解离动力学常数( )跨越近10个数量级。据我们所知,这个数据集是公开报道的同类数据集中最大的。利用这个数据集,我们基于蛋白质-配体原子对描述符推导了一个随机森林(RF)模型来预测 值。我们还证明,将建模结构用作额外的训练样本将有利于模型性能。具有混合结构的RF模型可作为验证其他更复杂的QSKR模型的基线。整个数据集,即 ,可在我们的PDBbind-CN网站(http://www.pdbbind.org.cn/download.php)上免费下载。