College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China.
School of Computer Science, Northwesterm Polytechnical University, Xi'an, Shaanxi, China.
PLoS Comput Biol. 2023 Jun 20;19(6):e1011207. doi: 10.1371/journal.pcbi.1011207. eCollection 2023 Jun.
Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.
转录因子与靶基因之间的相互作用构成了人类基因调控网络的主要部分,这仍然是生物研究中的复杂因素。具体来说,在已建立的数据库中记录的近一半相互作用中,它们的相互作用类型尚未得到证实。尽管存在几种用于预测基因相互作用及其类型的计算方法,但仍然没有仅基于拓扑信息来预测它们的方法。为此,我们在这里提出了一种基于图的预测模型,称为 KGE-TGI,并在专门为此问题构建的知识图上以多任务学习的方式进行训练。KGE-TGI 模型依赖于拓扑信息,而不是由基因表达数据驱动。在本文中,我们将预测转录因子和靶基因相互作用类型的任务形式化为异构图上链接类型的多标签分类问题,同时解决另一个内在相关的链接预测问题。我们构建了一个基准真实数据集,并在其上评估了所提出的方法。通过 5 折交叉实验,所提出的方法在链接预测和链接类型分类任务中的平均 AUC 值分别达到 0.9654 和 0.9339。此外,一系列比较实验的结果也证明了知识信息的引入对预测有显著的帮助,并且我们的方法在这个问题上达到了最先进的性能。