Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Japan.
Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
J Chem Inf Model. 2021 May 24;61(5):2341-2352. doi: 10.1021/acs.jcim.0c01452. Epub 2021 Apr 16.
In structure-based virtual screening (SBVS), a binding site on a protein structure is used to search for ligands with favorable nonbonded interactions. Because it is computationally difficult, docking is time-consuming and any docking user will eventually encounter a chemical library that is too big to dock. This problem might arise because there is not enough computing power or because preparing and storing so many three-dimensional (3D) ligands requires too much space. In this study, however, we show that quality regressors can be trained to predict docking scores from molecular fingerprints. Although typical docking has a screening rate of less than one ligand per second on one CPU core, our regressors can predict about 5800 docking scores per second. This approach allows us to focus docking on the portion of a database that is predicted to have docking scores below a user-chosen threshold. Herein, usage examples are shown, where only 25% of a ligand database is docked, without any significant virtual screening performance loss. We call this method "lean-docking". To validate lean-docking, a massive docking campaign using several state-of-the-art docking software packages was undertaken on an unbiased data set, with only wet-lab tested active and inactive molecules. Although regressors allow the screening of a larger chemical space, even at a constant docking power, it is also clear that significant progress in the virtual screening power of docking scores is desirable.
在基于结构的虚拟筛选(SBVS)中,利用蛋白质结构上的结合位点来搜索具有有利非键相互作用的配体。由于计算难度大,对接耗时较长,任何对接用户最终都会遇到太大而无法对接的化学库。这个问题可能是由于计算能力不足,或者由于准备和存储如此多的三维(3D)配体需要太多空间。然而,在这项研究中,我们表明,可以从分子指纹中训练质量回归器来预测对接评分。虽然典型的对接在一个 CPU 核心上每秒只能对接一个配体,但我们的回归器可以每秒预测大约 5800 个对接评分。这种方法可以让我们专注于那些预测对接评分低于用户选择阈值的数据库部分进行对接。在此,展示了一些使用示例,其中只有数据库中 25%的配体进行了对接,而虚拟筛选性能没有明显损失。我们将这种方法称为“lean-docking”。为了验证 lean-docking,我们在一个无偏数据集上使用了几个最先进的对接软件包进行了大规模的对接活动,其中只包含经过湿实验室测试的活性和非活性分子。虽然回归器允许筛选更大的化学空间,但即使在恒定的对接能力下,对接评分的虚拟筛选能力也明显需要取得进展。