Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea.
AIGEN Sciences, 04778, Seoul, Korea.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad484.
Forecasting the interaction between compounds and proteins is crucial for discovering new drugs. However, previous sequence-based studies have not utilized three-dimensional (3D) information on compounds and proteins, such as atom coordinates and distance matrices, to predict binding affinity. Furthermore, numerous widely adopted computational techniques have relied on sequences of amino acid characters for protein representations. This approach may constrain the model's ability to capture meaningful biochemical features, impeding a more comprehensive understanding of the underlying proteins. Here, we propose a two-step deep learning strategy named MulinforCPI that incorporates transfer learning techniques with multi-level resolution features to overcome these limitations. Our approach leverages 3D information from both proteins and compounds and acquires a profound understanding of the atomic-level features of proteins. Besides, our research highlights the divide between first-principle and data-driven methods, offering new research prospects for compound-protein interaction tasks. We applied the proposed method to six datasets: Davis, Metz, KIBA, CASF-2016, DUD-E and BindingDB, to evaluate the effectiveness of our approach.
预测化合物与蛋白质之间的相互作用对于发现新药至关重要。然而,以前基于序列的研究并未利用化合物和蛋白质的三维(3D)信息,如原子坐标和距离矩阵,来预测结合亲和力。此外,许多广泛采用的计算技术依赖于蛋白质的氨基酸字符序列进行表示。这种方法可能会限制模型捕捉有意义的生化特征的能力,阻碍对潜在蛋白质的更全面理解。在这里,我们提出了一种名为 MulinforCPI 的两步深度学习策略,该策略结合了转移学习技术和多层次分辨率特征,以克服这些限制。我们的方法利用了蛋白质和化合物的 3D 信息,并深入了解了蛋白质的原子级特征。此外,我们的研究突出了第一性原理和数据驱动方法之间的区别,为化合物-蛋白质相互作用任务提供了新的研究前景。我们将提出的方法应用于六个数据集:Davis、Metz、KIBA、CASF-2016、DUD-E 和 BindingDB,以评估我们方法的有效性。