Lee Jonghyun, Jun Dae Won, Song Ildae, Kim Yun
Department of Medical and Digital Engineering, Hanyang University College of Engineering, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea.
Department of Internal Medicine, Hanyang University College of Medicine, 222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea.
J Cheminform. 2024 Feb 1;16(1):14. doi: 10.1186/s13321-024-00808-1.
The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule's structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures, resulting in notable improvements in the prediction performance of the drug-target interactions task. Nonetheless, the target encoders employed in previous studies exhibit computational complexity that increases quadratically with the input length, thereby limiting their practical utility. To overcome this challenge, we adopt a hint-based learning strategy to develop a compact and efficient target encoder. With the adaptation parameter, our model can blend general knowledge and target-oriented knowledge to build features of the protein sequences. This approach yielded considerable performance enhancements and improved learning efficiency on three benchmark datasets: BIOSNAP, DAVIS, and Binding DB. Furthermore, our methodology boasts the merit of necessitating only a minimal Video RAM (VRAM) allocation, specifically 7.7GB, during the training phase (16.24% of the previous state-of-the-art model). This ensures the feasibility of training and inference even with constrained computational resources.
药物发现过程既艰巨又耗时,因此越来越多基于机器学习的研究被提出来以提高效率。该领域的一个重大挑战是预测药物分子的结构是否会与目标蛋白相互作用。最近的一项研究试图通过利用一种编码器来应对这一挑战,该编码器利用了分子和蛋白质结构的先验知识,从而在药物 - 靶点相互作用任务的预测性能上取得了显著提升。尽管如此,先前研究中使用的目标编码器表现出计算复杂度会随着输入长度呈二次方增长,从而限制了它们的实际效用。为了克服这一挑战,我们采用基于提示的学习策略来开发一种紧凑且高效的目标编码器。通过自适应参数,我们的模型可以融合通用知识和面向目标的知识来构建蛋白质序列的特征。这种方法在三个基准数据集BIOSNAP、DAVIS和Binding DB上实现了显著的性能提升并提高了学习效率。此外,我们的方法具有一个优点,即在训练阶段仅需要最少的视频随机存取存储器(VRAM)分配,具体为7.7GB(是先前最先进模型所需的16.24%)。这确保了即使在计算资源受限的情况下训练和推理的可行性。