Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K.
J Chem Inf Model. 2021 Dec 27;61(12):5841-5852. doi: 10.1021/acs.jcim.1c00866. Epub 2021 Nov 18.
Ligand-based methods play a crucial role in virtual screening when the 3D structure of the target is not available. This study discusses the results of a validation study of the CSD field-based ligand screener using a novel benchmarking data set containing 56 targets. The data set was created starting from the target UniProt IDs in a previously published data set (i.e., the AZ data set), by mining ChEMBL to find known active molecules for these targets and by using DUD-E to generate property-matched decoys of the identified actives. Several experiments were performed to assess the virtual screening performance of the new method. One of its strengths is that it can use an overlay of multiple flexible ligands as a query without the need to run several parallel calculations with one ligand at a time. Here, we discuss how changes to different parameter settings or adoption of different query models can influence the final performance compared to the performance when using the experimentally observed overlay of ligands. We have also generated the enrichment scores based on three external benchmark data sets to enable the comparison with existing methods previously validated using these data sets. Here, we present results for the standard DUD-E data set, the DUD-E+ data set, as well as the DUD_Lib_VS_1.0 data set which was designed for ligand-based virtual screening validation and hence is more suitable for this type of methods.
基于配体的方法在目标 3D 结构不可用时在虚拟筛选中起着至关重要的作用。本研究讨论了使用包含 56 个靶标的新型基准数据集对基于 CSD 的配体筛选器进行验证研究的结果。该数据集是从先前发表的数据集中(即 AZ 数据集)的靶标 UniProt ID 开始创建的,通过挖掘 ChEMBL 为这些靶标找到已知的活性分子,并使用 DUD-E 为鉴定的活性物生成具有匹配性质的虚拟分子。进行了多项实验来评估新方法的虚拟筛选性能。其优点之一是它可以使用多个柔性配体的叠加作为查询,而无需一次运行多个具有一个配体的并行计算。在这里,我们讨论了与使用实验观察到的配体叠加相比,不同参数设置的更改或采用不同的查询模型如何影响最终性能。我们还基于三个外部基准数据集生成了富集分数,以便与之前使用这些数据集验证的现有方法进行比较。在这里,我们展示了标准 DUD-E 数据集、DUD-E+ 数据集以及 DUD_Lib_VS_1.0 数据集的结果,DUD_Lib_VS_1.0 数据集专为基于配体的虚拟筛选验证而设计,因此更适合这种类型的方法。