用于早期命中物识别的目标驱动型机器学习虚拟筛选（TAME-VS）平台。

Target-driven machine learning-enabled virtual screening (TAME-VS) platform for early-stage hit identification.

作者信息

Bian Yuemin, Kwon Jason J, Liu Cong, Margiotta Enrico, Shekhar Mrinal, Gould Alexandra E

机构信息

Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, United States.

Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States.

出版信息

Front Mol Biosci. 2023 Mar 13;10:1163536. doi: 10.3389/fmolb.2023.1163536. eCollection 2023.

DOI:10.3389/fmolb.2023.1163536

PMID:36994428

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10040869/

Abstract

High-throughput screening (HTS) methods enable the empirical evaluation of a large scale of compounds and can be augmented by virtual screening (VS) techniques to save time and money by using potential active compounds for experimental testing. Structure-based and ligand-based virtual screening approaches have been extensively studied and applied in drug discovery practice with proven outcomes in advancing candidate molecules. However, the experimental data required for VS are expensive, and hit identification in an effective and efficient manner is particularly challenging during early-stage drug discovery for novel protein targets. Herein, we present our TArget-driven Machine learning-Enabled VS (TAME-VS) platform, which leverages existing chemical databases of bioactive molecules to modularly facilitate hit finding. Our methodology enables bespoke hit identification campaigns through a user-defined protein target. The input target ID is used to perform a homology-based target expansion, followed by compound retrieval from a large compilation of molecules with experimentally validated activity. Compounds are subsequently vectorized and adopted for machine learning (ML) model training. These machine learning models are deployed to perform model-based inferential virtual screening, and compounds are nominated based on predicted activity. Our platform was retrospectively validated across ten diverse protein targets and demonstrated clear predictive power. The implemented methodology provides a flexible and efficient approach that is accessible to a wide range of users. The TAME-VS platform is publicly available at https://github.com/bymgood/Target-driven-ML-enabled-VS to facilitate early-stage hit identification.

摘要

高通量筛选（HTS）方法能够对大量化合物进行实证评估，并且可以通过虚拟筛选（VS）技术进行补充，以便通过使用潜在的活性化合物进行实验测试来节省时间和金钱。基于结构和基于配体的虚拟筛选方法已经得到广泛研究，并应用于药物发现实践中，在推进候选分子方面取得了已证实的成果。然而，虚拟筛选所需的实验数据成本高昂，并且在针对新型蛋白质靶点的早期药物发现过程中，以有效且高效的方式识别活性化合物尤其具有挑战性。在此，我们展示了我们的靶向驱动机器学习虚拟筛选（TAME-VS）平台，该平台利用现有的生物活性分子化学数据库以模块化方式促进活性化合物的发现。我们的方法能够通过用户定义的蛋白质靶点开展定制化的活性化合物识别活动。输入的靶点ID用于进行基于同源性的靶点扩展，随后从大量具有实验验证活性的分子集合中检索化合物。化合物随后被矢量化并用于机器学习（ML）模型训练。这些机器学习模型被部署以执行基于模型的推理虚拟筛选，并根据预测活性提名化合物。我们的平台在十个不同的蛋白质靶点上进行了回顾性验证，并展示出清晰的预测能力。所实施的方法提供了一种灵活且高效的途径，广大用户均可使用。TAME-VS平台可在https://github.com/bymgood/Target-driven-ML-enabled-VS上公开获取，以促进早期活性化合物的识别。