Beijing Institute of Microbiology and Epidemiology, Beijing 100850, China.
Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China.
J Chem Inf Model. 2022 Sep 26;62(18):4380-4390. doi: 10.1021/acs.jcim.2c00960. Epub 2022 Sep 2.
Accurately predicting the binding affinity of protein-ligand pairs is an essential part of drug discovery. Since wet laboratory experiments to determine the binding affinity are expensive and time-consuming, several computational methods for binding affinity prediction have been proposed. In the representation of compounds, most methods only focus on the structural properties such as SMILES and ignore the bioactive properties. In this study, we proposed a novel model named PLA-MoRe to predict protein-ligand binding affinity, which represents compounds based on both structural and bioactive properties and mainly contains three feature extractors. First, a structure feature extractor based on the graph isomorphism network was constructed to learn the representations of the molecular graphs. Second, we designed an Autoencoder-based bioactive feature extractor to integrate the multisource bioactive information including chemical, target, network, cellular, and clinical. The above two parts aimed to learn representations of compounds in terms of structures and bioactivities, respectively. Then, we constructed a sequence feature extractor to learn embeddings for protein sequences. The output of the three extractors was concatenated and fed into a fully connected network for affinity prediction. We compared PLA-MoRe with three state-of-the-art methods, and an ablation study was conducted to test the role of each part of the model. Further attention visualization showed that our model had the potential to locate the binding sites, which might help explain the mechanism of interaction. These results prove that PLA-MoRe is competitive and reliable. The resource codes are freely available at the GitHub repository https://github.com/QingyuLiaib/PLA-MoRe.
准确预测蛋白质-配体对的结合亲和力是药物发现的重要组成部分。由于确定结合亲和力的湿实验室实验既昂贵又耗时,因此已经提出了几种用于结合亲和力预测的计算方法。在化合物的表示中,大多数方法仅关注结构性质,如 SMILES,而忽略了生物活性性质。在本研究中,我们提出了一种名为 PLA-MoRe 的新型模型,用于预测蛋白质-配体结合亲和力,该模型基于结构和生物活性性质来表示化合物,主要包含三个特征提取器。首先,构建了基于图同构网络的结构特征提取器,以学习分子图的表示。其次,我们设计了基于自动编码器的生物活性特征提取器,以整合包括化学、靶标、网络、细胞和临床在内的多源生物活性信息。上述两部分旨在分别学习化合物在结构和生物活性方面的表示。然后,我们构建了一个序列特征提取器,用于学习蛋白质序列的嵌入。三个提取器的输出被连接起来,并输入到全连接网络中进行亲和力预测。我们将 PLA-MoRe 与三种最先进的方法进行了比较,并进行了消融研究以测试模型每个部分的作用。进一步的注意力可视化表明,我们的模型有可能定位结合位点,这可能有助于解释相互作用的机制。这些结果证明 PLA-MoRe 具有竞争力和可靠性。资源代码可在 GitHub 存储库 https://github.com/QingyuLiaib/PLA-MoRe 上免费获得。