School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China.
School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China.
Artif Intell Med. 2024 Mar;149:102778. doi: 10.1016/j.artmed.2024.102778. Epub 2024 Jan 18.
Many computational methods have been proposed to identify potential drug-target interactions (DTIs) to expedite drug development. Graph neural network (GNN) methods are considered to be one of the most effective approaches. However, shallow GNN methods can only aggregate local information from nodes. Also, deep GNN methods may result in over-smoothing while obtaining long-distance neighbourhood information. As a result, existing GNN methods struggle to extract the complete features of the graph. Additionally, the number of known DTIs is insufficient, and there are far more unknown drug-target pairs than known DTIs, leading to class imbalance. This article proposes a model that combines graph autoencoder and self-supervised learning to accurately encode multilevel features of graphs using only a small number of labelled samples. We introduce a positive sample compensation coefficient to the objective function to mitigate the impact of class imbalance. Experiments on two datasets demonstrated that our model outperforms the four baseline methods, and the new DTIs predicted by the SSLDTI model were verified by the DrugBank database.
许多计算方法已被提出用于识别潜在的药物-靶标相互作用(DTI)以加速药物开发。图神经网络(GNN)方法被认为是最有效的方法之一。然而,浅层 GNN 方法只能从节点聚合局部信息。此外,深层 GNN 方法在获取远距离邻域信息时可能会导致过度平滑。因此,现有的 GNN 方法难以提取图的完整特征。此外,已知的 DTI 数量不足,未知的药物-靶标对远远多于已知的 DTI,导致类不平衡。本文提出了一种结合图自动编码器和自监督学习的模型,仅使用少量标记样本就能准确编码图的多层次特征。我们在目标函数中引入了正样本补偿系数,以减轻类不平衡的影响。在两个数据集上的实验表明,我们的模型优于四个基线方法,并且通过 DrugBank 数据库验证了 SSLDTI 模型预测的新 DTI。