打破药物靶点亲和力预测中数据稀缺的障碍。

Breaking the barriers of data scarcity in drug-target affinity prediction.

机构信息

Gaoling School of Artificial Intelligence, Renmin University of China, No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing, China.

Microsoft Research AI4Science, No.5, Dan Ling Street, Haidian District, 100080, Beijing, China.

出版信息

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad386.

DOI:10.1093/bib/bbad386

PMID:37903413

Abstract

Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.

摘要

准确预测药物-靶标亲和力（DTA）在药物发现的早期阶段至关重要，有助于识别能够与特定靶标有效相互作用并调节其活性的药物。虽然湿实验仍然是最可靠的方法，但它们耗时且资源密集，导致可用数据有限，这对深度学习方法提出了挑战。现有的方法主要侧重于开发基于可用 DTA 数据的技术，而没有充分解决数据稀缺问题。为了克服这一挑战，我们提出了用于 DTA 预测的半监督多任务训练（SSM）框架，该框架结合了三种简单但非常有效的策略：（1）多任务训练方法，该方法结合了 DTA 预测和使用配对药物-靶标数据的掩蔽语言建模。（2）一种半监督训练方法，利用大规模未配对的分子和蛋白质来增强药物和靶标表示。这种方法与仅使用分子或蛋白质进行预训练的先前方法不同。（3）集成轻量级交叉注意模块，以改善药物和靶标之间的相互作用，进一步提高预测准确性。通过在 BindingDB、DAVIS 和 KIBA 等基准数据集上进行广泛的实验，我们证明了我们框架的优越性能。此外，我们对特定药物-靶标结合活性、虚拟筛选实验、药物特征可视化和实际应用进行了案例研究，所有这些都展示了我们工作的巨大潜力。总之，我们提出的 SSM-DTA 框架解决了 DTA 预测中的数据限制挑战，并取得了有希望的结果，为更高效和准确的药物发现过程铺平了道路。