Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr 2, D-53113 Bonn, Germany.
J Chem Inf Model. 2011 Oct 24;51(10):2467-73. doi: 10.1021/ci200309j. Epub 2011 Sep 26.
Benchmark calculations are essential for the evaluation of virtual screening (VS) methods. Typically, classes of known active compounds taken from the medicinal chemistry literature are divided into reference molecules (search templates) and potential hits that are added to background databases assumed to consist of compounds not sharing this activity. Then VS calculations are carried out, and the recall of known active compounds is determined. However, conventional benchmarking is affected by a number of problems that reduce its value for method evaluation. In addition to often insufficient statistical validation and the lack of generally accepted evaluation standards, the artificial nature of typical benchmark settings is often criticized. Retrospective benchmark calculations generally overestimate the potential of VS methods and do not scale with their performance in prospective applications. In order to provide additional opportunities for benchmarking that more closely resemble practical VS conditions, we have designed a publicly available compound database (DB) of reproducible virtual screens (REPROVIS-DB) that organizes information from successful ligand-based VS applications including reference compounds, screening databases, compound selection criteria, and experimentally confirmed hits. Using the currently available 25 hand-selected compound data sets, one can attempt to reproduce successful virtual screens with other than the originally applied methods and assess their potential for practical applications.
基准计算对于虚拟筛选 (VS) 方法的评估至关重要。通常,从药物化学文献中选取的已知活性化合物类被分为参考分子(搜索模板)和潜在命中化合物,这些命中化合物被添加到假定由不具有这种活性的化合物组成的背景数据库中。然后进行 VS 计算,并确定已知活性化合物的召回率。然而,传统的基准测试受到许多问题的影响,这些问题降低了其方法评估的价值。除了通常缺乏充分的统计验证和缺乏普遍接受的评估标准外,典型基准设置的人为性质也经常受到批评。回顾性基准计算通常高估了 VS 方法的潜力,并且与其在前瞻性应用中的性能不成比例。为了提供更接近实际 VS 条件的基准测试机会,我们设计了一个可公开获取的可重现虚拟筛选化合物数据库 (REPROVIS-DB),该数据库组织了成功基于配体的 VS 应用的信息,包括参考化合物、筛选数据库、化合物选择标准和实验确认的命中化合物。使用当前可用的 25 个手工选择的化合物数据集,人们可以尝试使用不同于最初应用的方法重现成功的虚拟筛选,并评估它们在实际应用中的潜力。