Kumar Surendra, Kim Mi-Hyun
Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
J Cheminform. 2021 Mar 25;13(1):28. doi: 10.1186/s13321-021-00507-1.
In drug discovery, rapid and accurate prediction of protein-ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope for a high correlation between docking score and pose with key interactive residues, although scoring functions as free energy surrogates of protein-ligand complexes have failed to provide collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawbacks of scoring functions. Despite being highly accurate, their featurization process is complex and the meaning of the embedded features cannot directly be interpreted by human recognition without an additional feature analysis. Here, we propose SMPLIP-Score (Substructural Molecular and Protein-Ligand Interaction Pattern Score), a direct interpretable predictor of absolute binding affinity. Our simple featurization embeds the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite their less complex features than other state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson's correlation coefficient up to 0.80, and a root mean square error up to 1.18 in pK units with several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness were examined using direct interpretation of feature matrices for specific targets.
在药物研发中,快速准确地预测蛋白质-配体结合亲和力是先导化合物优化的关键任务,以确保具有可接受的靶标活性和药理疗效。此外,研究人员希望对接分数与关键相互作用残基的构象之间具有高度相关性,尽管作为蛋白质-配体复合物自由能替代物的评分函数未能提供共线性关系。最近,人们提出了各种机器学习或深度学习方法来克服评分函数的缺点。尽管这些方法非常准确,但其特征化过程复杂,而且在没有额外特征分析的情况下,嵌入特征的含义无法直接通过人类识别来解释。在此,我们提出了SMPLIP-Score(亚结构分子与蛋白质-配体相互作用模式评分),一种绝对结合亲和力的直接可解释预测器。我们简单的特征化方法将配体结合位点环境上的相互作用指纹模式和配体的分子片段嵌入到一个输入向量化矩阵中,用于学习层(随机森林或深度神经网络)。尽管SMPLIP-Score的特征比其他现有模型简单,但在几个基准数据集(PDBbind v.2015、Astex多样集、CSAR NRC HiQ、FEP、PDBbind NMR和CASF-2016)上,它取得了相当的性能,皮尔逊相关系数高达0.80,以pK单位计的均方根误差高达1.18。对于该模型,通过对特定靶标的特征矩阵进行直接解释,检验了其通用性、预测能力、排序能力和稳健性。