Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, 510275, Guangzhou, China.
Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China, and Pazhou Lab, Guangzhou, 510330, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab409.
Learning node representation is a fundamental problem in biological network analysis, as compact representation features reveal complicated network structures and carry useful information for downstream tasks such as link prediction and node classification. Recently, multiple networks that profile objects from different aspects are increasingly accumulated, providing the opportunity to learn objects from multiple perspectives. However, the complex common and specific information across different networks pose challenges to node representation methods. Moreover, ubiquitous noise in networks calls for more robust representation. To deal with these problems, we present a representation learning method for multiple biological networks. First, we accommodate the noise and spurious edges in networks using denoised diffusion, providing robust connectivity structures for the subsequent representation learning. Then, we introduce a graph regularized integration model to combine refined networks and compute common representation features. By using the regularized decomposition technique, the proposed model can effectively preserve the common structural property of different networks and simultaneously accommodate their specific information, leading to a consistent representation. A simulation study shows the superiority of the proposed method on different levels of noisy networks. Three network-based inference tasks, including drug-target interaction prediction, gene function identification and fine-grained species categorization, are conducted using representation features learned from our method. Biological networks at different scales and levels of sparsity are involved. Experimental results on real-world data show that the proposed method has robust performance compared with alternatives. Overall, by eliminating noise and integrating effectively, the proposed method is able to learn useful representations from multiple biological networks.
学习节点表示是生物网络分析中的一个基本问题,因为紧凑的表示特征揭示了复杂的网络结构,并为下游任务(如链路预测和节点分类)提供了有用的信息。最近,越来越多的从不同方面描述对象的多网络被积累起来,为从多个角度学习对象提供了机会。然而,不同网络之间复杂的共同和特定信息给节点表示方法带来了挑战。此外,网络中普遍存在的噪声要求更稳健的表示。为了解决这些问题,我们提出了一种用于多生物网络的表示学习方法。首先,我们使用去噪扩散来适应网络中的噪声和虚假边,为后续的表示学习提供稳健的连接结构。然后,我们引入了一个图正则化集成模型来结合细化后的网络并计算共同的表示特征。通过使用正则化分解技术,所提出的模型可以有效地保留不同网络的共同结构属性,同时容纳它们的特定信息,从而实现一致的表示。一项模拟研究表明了该方法在不同噪声水平的网络上的优越性。使用我们的方法从学习到的表示特征进行了三种基于网络的推断任务,包括药物-靶标相互作用预测、基因功能识别和细粒度物种分类。涉及到不同规模和稀疏程度的生物网络。使用真实数据的实验结果表明,与其他方法相比,所提出的方法具有稳健的性能。总的来说,通过有效地消除噪声和集成,所提出的方法能够从多个生物网络中学习到有用的表示。