Informatics Institute, University of Missouri, Columbia, Missouri 65211, USA.
J Chem Inf Model. 2013 Aug 26;53(8):1905-14. doi: 10.1021/ci400045v. Epub 2013 May 21.
In this study, we use the recently released 2012 Community Structure-Activity Resource (CSAR) data set to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential (VDWScore). The CSAR data set contains 757 compounds, most with known affinities, and 57 crystal structures. With the help of the script files for docking preparation, we use the full CSAR data set to evaluate the performances of the scoring functions on binding affinity prediction and active/inactive compound discrimination. The CSAR subset that includes crystal structures is used as well, to evaluate the performances of the scoring functions on binding mode and affinity predictions. Within this structure subset, we investigate the importance of accurate ligand and protein conformational sampling and find that the binding affinity predictions are less sensitive to non-native ligand and protein conformations than the binding mode predictions. We also find the full CSAR data set to be more challenging in making binding mode predictions than the subset with structures. The script files used for preparing the CSAR data set for docking, including scripts for canonicalization of the ligand atoms, are offered freely to the academic community.
在这项研究中,我们使用最近发布的 2012 年社区结构-活性资源(CSAR)数据集来评估两种基于知识的评分函数,即 ITScore 和 STScore,以及一种简单的基于力场的势能(VDWScore)。CSAR 数据集包含 757 个化合物,大多数具有已知的亲和力,以及 57 个晶体结构。借助对接准备的脚本文件,我们使用完整的 CSAR 数据集来评估评分函数在结合亲和力预测和活性/非活性化合物区分方面的性能。我们还使用包含晶体结构的 CSAR 子集来评估评分函数在结合模式和亲和力预测方面的性能。在这个结构子集中,我们研究了准确的配体和蛋白质构象采样的重要性,发现结合亲和力预测对非天然配体和蛋白质构象的敏感性低于结合模式预测。我们还发现,与具有结构的子集相比,完整的 CSAR 数据集在进行结合模式预测时更具挑战性。用于为对接准备 CSAR 数据集的脚本文件,包括配体原子规范化的脚本,都免费提供给学术界。