Novartis Institutes for BioMedical Research, Emeryville, California 94608, United States.
Novartis Institutes for BioMedical Research, San Diego, California 92121, United States.
J Chem Inf Model. 2024 Apr 8;64(7):2695-2704. doi: 10.1021/acs.jcim.3c01855. Epub 2024 Jan 31.
Predicting compound activity in assays is a long-standing challenge in drug discovery. Computational models based on compound-induced gene expression signatures from a single profiling assay have shown promise toward predicting compound activity in other, seemingly unrelated, assays. Applications of such models include predicting mechanisms-of-action (MoA) for phenotypic hits, identifying off-target activities, and identifying polypharmacologies. Here, we introduce transcriptomics-to-activity transformer (TAT) models that leverage gene expression profiles observed over compound treatment at multiple concentrations to predict the compound activity in other biochemical or cellular assays. We built TAT models based on gene expression data from a RASL-seq assay to predict the activity of 2692 compounds in 262 dose-response assays. We obtained useful models for 51% of the assays, as determined through a realistic held-out set. Prospectively, we experimentally validated the activity predictions of a TAT model in a malaria inhibition assay. With a 63% hit rate, TAT successfully identified several submicromolar malaria inhibitors. Our results thus demonstrate the potential of transcriptomic responses over compound concentration and the TAT modeling framework as a cost-efficient way to identify the bioactivities of promising compounds across many assays.
在药物发现中,预测化合物在实验中的活性是一个长期存在的挑战。基于单一分析物中化合物诱导的基因表达特征的计算模型,在预测其他看似不相关的实验中的化合物活性方面显示出了前景。此类模型的应用包括预测表型命中的作用机制(MoA)、识别脱靶活性以及鉴定多效性。在这里,我们引入了转录组学-活性转化器(TAT)模型,该模型利用在多个浓度下观察到的化合物处理过程中的基因表达谱来预测其他生化或细胞实验中的化合物活性。我们基于 RASL-seq 分析中的基因表达数据构建了 TAT 模型,以预测 2692 种化合物在 262 种剂量反应实验中的活性。通过真实的保留集,我们确定了 51%的实验有可用的模型。前瞻性地,我们在疟疾抑制实验中对 TAT 模型的活性预测进行了实验验证。TAT 的命中率为 63%,成功鉴定了几种亚毫摩尔级别的疟疾抑制剂。因此,我们的结果证明了在化合物浓度和 TAT 建模框架上进行转录组反应的潜力,这是一种在许多实验中识别有前途的化合物生物活性的经济高效的方法。