Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States.
School of Electronics and Information Engineering, Ankang University, Ankang 725000, China.
J Chem Inf Model. 2023 Nov 13;63(21):6608-6618. doi: 10.1021/acs.jcim.3c01350. Epub 2023 Oct 29.
In this study, we systematically studied the energy distribution of bioactive conformations of small molecular ligands in their conformational ensembles using ANI-2X, a machine learning potential, in conjunction with one of our recently developed geometry optimization algorithms, known as a conjugate gradient with backtracking line search (CG-BS). We first evaluated the combination of these methods (ANI-2X/CG-BS) using two molecule sets. For the 231-molecule set, ab initio calculations were performed at both the ωB97X/6-31G(d) and B3LYP-D3BJ/DZVP levels for accuracy comparison, while for the 8,992-molecule set, ab initio calculations were carried out at the B3LYP-D3BJ/DZVP level. For each molecule in the two molecular sets, up to 10 conformations were generated, which diminish the influence of individual outliers on the performance evaluation. Encouraged by the performance of ANI-2x/CG-BS in these evaluations, we calculated the energy distributions using ANI-2x/CG-BS for more than 27,000 ligands in the protein data bank (PDB). Each ligand has at least one conformation bound to a biological molecule, and this ligand conformation is labeled as a bound conformation. Besides the bound conformations, up to 200 conformations were generated using OpenEye's Omega2 software (https://docs.eyesopen.com/applications/ omega/) for each conformation. We performed a statistical analysis of how the bound conformation energies are distributed in the ensembles for 17,197 PDB ligands that have their bound conformation energies within the energy ranges of the Omega2-generated conformation ensembles. We found that half of the ligands have their relative conformation energy lower than 2.91 kcal/mol for the bound conformations in comparison with the global conformations, and about 90% of the bound conformations are within 10 kcal/mol above the global conformation energies. This information is useful to guide the construction of libraries for shape-based virtual screening and to improve the docking algorithm to efficiently sample bound conformations.
在这项研究中,我们使用机器学习势 ANI-2X 结合我们最近开发的一种几何优化算法,即共轭梯度回溯线搜索 (CG-BS),系统地研究了小分子配体在构象系综中的生物活性构象的能量分布。我们首先使用两种分子集来评估这些方法(ANI-2X/CG-BS)的组合。对于 231 分子集,在 ωB97X/6-31G(d) 和 B3LYP-D3BJ/DZVP 水平上进行了从头算计算,以进行准确性比较,而对于 8992 分子集,则在 B3LYP-D3BJ/DZVP 水平上进行了从头算计算。对于这两个分子集中的每个分子,生成了多达 10 种构象,这减少了单个异常值对性能评估的影响。在这些评估中,ANI-2x/CG-BS 的性能令人鼓舞,我们使用 ANI-2x/CG-BS 为蛋白质数据库 (PDB) 中的 27000 多个配体计算了能量分布。每个配体至少有一种与生物分子结合的构象,这种配体构象被标记为结合构象。除了结合构象外,对于每个构象,我们还使用 OpenEye 的 Omega2 软件(https://docs.eyesopen.com/applications/omega/)生成了多达 200 种构象。对于具有结合构象能量在 Omega2 生成构象系综能量范围内的 17197 个 PDB 配体,我们对结合构象在系综中的能量分布进行了统计分析。我们发现,与全局构象相比,一半的配体的相对构象能量对于结合构象低于 2.91 kcal/mol,并且大约 90%的结合构象的能量在全局构象能量之上 10 kcal/mol 以内。这些信息对于指导基于形状的虚拟筛选库的构建和改进对接算法以有效地采样结合构象非常有用。