Holderbach Stefan, Adam Lukas, Jayaram B, Wade Rebecca C, Mukherjee Goutam
Molecular and Cellular Modelling Group, Heidelberg Institute of Theoretical Studies, Heidelberg, Germany.
Institute of Pharmacy and Molecular Biotechnology (IPMB), Heidelberg University, Heidelberg, Germany.
Front Mol Biosci. 2020 Dec 17;7:601065. doi: 10.3389/fmolb.2020.601065. eCollection 2020.
The virtual screening of large numbers of compounds against target protein binding sites has become an integral component of drug discovery workflows. This screening is often done by computationally docking ligands into a protein binding site of interest, but this has the drawback of a large number of poses that must be evaluated to obtain accurate estimates of protein-ligand binding affinity. We here introduce a fast pre-filtering method for ligand prioritization that is based on a set of machine learning models and uses simple pose-invariant physicochemical descriptors of the ligands and the protein binding pocket. Our method, Rapid Screening with Physicochemical Descriptors + machine learning (RASPD+), is trained on PDBbind data and achieves a regression performance that is better than that of the original RASPD method and traditional scoring functions on a range of different test sets without the need for generating ligand poses. Additionally, we use RASPD+ to identify molecular features important for binding affinity and assess the ability of RASPD+ to enrich active molecules from decoys.
针对目标蛋白结合位点进行大量化合物的虚拟筛选已成为药物发现工作流程中不可或缺的一部分。这种筛选通常通过将配体计算对接至感兴趣的蛋白质结合位点来完成,但这样做存在一个缺点,即必须评估大量的构象才能获得蛋白质-配体结合亲和力的准确估计值。我们在此引入一种基于一组机器学习模型的快速预筛选方法,用于配体优先级排序,该方法使用配体和蛋白质结合口袋的简单构象不变物理化学描述符。我们的方法,即基于物理化学描述符+机器学习的快速筛选(RASPD+),在PDBbind数据上进行训练,并且在一系列不同测试集上实现了比原始RASPD方法和传统评分函数更好的回归性能,而无需生成配体构象。此外,我们使用RASPD+来识别对结合亲和力重要的分子特征,并评估RASPD+从诱饵中富集活性分子的能力。