Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:4348-4353. doi: 10.1109/EMBC46164.2021.9629695.
Understanding the interactions between novel drugs and target proteins is fundamentally important in disease research as discovering drug-protein interactions can be an exceptionally time-consuming and expensive process. Alternatively, this process can be simulated using modern deep learning methods that have the potential of utilising vast quantities of data to reduce the cost and time required to provide accurate predictions. We seek to leverage a set of BERT-style models that have been pre-trained on vast quantities of both protein and drug data. The encodings produced by each model are then utilised as node representations for a graph convolutional neural network, which in turn are used to model the interactions without the need to simultaneously fine-tune both protein and drug BERT models to the task. We evaluate the performance of our approach on two drug-target interaction datasets that were previously used as benchmarks in recent work.Our results significantly improve upon a vanilla BERT baseline approach as well as the former state-of-the-art methods for each task dataset. Our approach builds upon past work in two key areas; firstly, we take full advantage of two large pre-trained BERT models that provide improved representations of task-relevant properties of both drugs and proteins. Secondly, inspired by work in natural language processing that investigates how linguistic structure is represented in such models, we perform interpretability analyses that allow us to locate functionally-relevant areas of interest within each drug and protein. By modelling the drug-target interactions as a graph as opposed to a set of isolated interactions, we demonstrate the benefits of combining large pre-trained models and a graph neural network to make state-of-the-art predictions on drug-target binding affinity.
理解新型药物与靶蛋白之间的相互作用在疾病研究中至关重要,因为发现药物-蛋白相互作用可能是一个极其耗时和昂贵的过程。或者,可以使用现代深度学习方法来模拟该过程,这些方法有可能利用大量数据来降低提供准确预测所需的成本和时间。我们寻求利用一组经过大量蛋白质和药物数据预训练的 BERT 风格模型。然后,将每个模型生成的编码用作图卷积神经网络的节点表示,而无需同时对蛋白质和药物 BERT 模型进行微调,即可对相互作用进行建模。我们在两个先前在最近的工作中用作基准的药物-靶标相互作用数据集上评估了我们方法的性能。我们的结果大大优于香草 BERT 基线方法以及每个任务数据集的前一个最先进方法。我们的方法建立在过去两个关键领域的工作之上;首先,我们充分利用了两个大型预训练的 BERT 模型,这些模型提供了对药物和蛋白质的相关特性的改进表示。其次,受自然语言处理中研究这些模型如何表示语言结构的工作的启发,我们进行了可解释性分析,使我们能够在每个药物和蛋白质中找到功能相关的感兴趣区域。通过将药物-靶标相互作用建模为图而不是一组孤立的相互作用,我们展示了结合大型预训练模型和图神经网络在药物-靶标结合亲和力方面做出最先进预测的优势。