College of Computer, National University of Defense Technology, China.
National Supercomputer Center in Tianjin, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab491.
Understanding chemical-gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions.
We developed BioNet, a deep biological networkmodel with a graph encoder-decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.
理解化学基因相互作用(CGI)对于药物筛选至关重要。湿实验通常成本高且费力,这限制了相关研究的规模。相反,计算研究能够有效地进行计算机探索。对于 CGI 预测问题,一种常见的方法是在涉及各种生物医学实体的异构网络上进行系统分析。最近,图神经网络在关系预测领域变得流行。然而,生物相互作用网络的固有异构复杂性和大量数据带来了巨大的挑战。本文旨在开发一种数据驱动的模型,该模型能够从交互网络中学习潜在信息并进行正确预测。
我们开发了 BioNet,这是一种具有图编码器-解码器架构的深度生物网络模型。图编码器利用图卷积从化学物质、基因、疾病和生物途径之间复杂相互作用中嵌入的潜在信息进行学习。学习过程具有两个连续的步骤。然后,基于 RESCAL 算法,利用张量分解解码器从编码器学习到的嵌入式信息来进行化学物质和基因之间的多种类型交互预测。BioNet 包含 79325 个实体作为节点,34005501 个关系作为边。为了训练如此庞大的深度图模型,BioNet 引入了一种利用多个图形处理单元(GPU)的并行训练算法。评估实验表明,BioNet 表现出出色的预测性能,最佳接收器操作特征(ROC)曲线下面积为 0.952,明显优于最先进的方法。为了进一步验证,通过外部编目数据和已发表的文献验证了 BioNet 预测的癌症和 COVID-19 的顶级 CGI。