Irwin John J
Department of Pharmaceutical Chemistry, University of California San Francisco, PO Box 2550, Byers Hall, San Francisco, CA 94158-2330, USA.
J Comput Aided Mol Des. 2008 Mar-Apr;22(3-4):193-9. doi: 10.1007/s10822-008-9189-4. Epub 2008 Feb 14.
Ligand enrichment among top-ranking hits is a key metric of virtual screening. To avoid bias, decoys should resemble ligands physically, so that enrichment is not attributable to simple differences of gross features. We therefore created a directory of useful decoys (DUD) by selecting decoys that resembled annotated ligands physically but not topologically to benchmark docking performance. DUD has 2950 annotated ligands and 95,316 property-matched decoys for 40 targets. It is by far the largest and most comprehensive public data set for benchmarking virtual screening programs that I am aware of. This paper outlines several ways that DUD can be improved to provide better telemetry to investigators seeking to understand both the strengths and the weaknesses of current docking methods. I also highlight several pitfalls for the unwary: a risk of over-optimization, questions about chemical space, and the proper scope for using DUD. Careful attention to both the composition of benchmarks and how they are used is essential to avoid being misled by overfitting and bias.
排名靠前的命中配体中的配体富集是虚拟筛选的关键指标。为避免偏差,诱饵应在物理上类似于配体,以使富集不归因于总体特征的简单差异。因此,我们通过选择在物理上类似于注释配体但拓扑结构不同的诱饵来创建一个有用诱饵目录(DUD),以基准化对接性能。DUD有针对40个靶点的2950个注释配体和95316个性质匹配的诱饵。据我所知,它是目前用于基准化虚拟筛选程序的最大且最全面的公共数据集。本文概述了几种改进DUD的方法,以便为试图了解当前对接方法优缺点的研究人员提供更好的遥测数据。我还强调了一些粗心者易犯的错误:过度优化的风险、关于化学空间的问题以及使用DUD的适当范围。仔细关注基准的组成及其使用方式对于避免因过度拟合和偏差而被误导至关重要。