Liu Yunchao Lance, Dong Ha, Wang Xin, Moretti Rocco, Wang Yu, Su Zhaoqian, Gu Jiawei, Bodenheimer Bobby, Weaver Charles David, Meiler Jens, Derr Tyler
Computer Science Dept., Vanderbilt University (VU).
Neural Science Dept., Amherst College.
ArXiv. 2024 Nov 14:arXiv:2411.09820v1.
While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, . Specifically, our contributions are threefold: - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; - we evaluate model performance through various research questions using the dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed as the gold standard in small molecule drug discovery benchmarking. The dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.
虽然深度学习彻底改变了计算机辅助药物发现,但人工智能社区主要专注于模型创新,而较少强调建立最佳的基准测试实践。我们认为,如果没有一个完善的模型评估框架,人工智能社区的努力就无法充分发挥其潜力,从而减缓创新向实际药物发现的进展和转化。因此,在本文中,我们试图为小分子药物发现基准测试建立一个新的黄金标准。具体来说,我们的贡献有三个方面:- 我们引入了一个精心策划的包含9个数据集的集合,涵盖5个治疗靶点类别。我们由药物发现专家设计的分层策划流程,通过利用额外的验证性和反筛选以及严格的领域驱动预处理(如泛测定干扰化合物(PAINS)过滤),超越了初级高通量筛选,以确保数据集中的高质量数据;- 我们提出了一个标准化的模型评估框架,考虑了高质量数据集、特征化、3D构象生成、评估指标和数据分割,为进行实际虚拟筛选的药物发现专家提供了可靠的基准测试;- 我们使用数据集集合通过各种研究问题评估模型性能,探索不同模型、数据集质量、特征化方法和数据分割策略对结果的影响。总之,我们建议采用我们提出的方法作为小分子药物发现基准测试的黄金标准。数据集集合以及策划代码和实验脚本都可在WelQrate.org上公开获取。