Wang Shilong, Cui Hai, Qu Yanchen, Zhang Yijia
Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae718.
Identifying biologically significant protein complexes from protein-protein interaction (PPI) networks and understanding their roles are essential for elucidating protein functions, life processes, and disease mechanisms. Current methods typically rely on static PPI networks and model PPI data as pairwise relationships, which presents several limitations. Firstly, static PPI networks do not adequately represent the scopes and temporal dynamics of protein interactions. Secondly, a large amount of available biological resources have not been fully integrated. Moreover, PPIs in biological systems are not merely one-to-one relationships but involve higher order non-pairwise interactions. To alleviate these issues, we propose HGST, a multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork (subnet) embedding method for identifying biologically significant protein complexes from PPI networks. HGST initially constructs spatiotemporal PPI subnets using the scopes and temporal dynamics of proteins derived from multi-source biological knowledge, treating them as dynamic networks through fine-grained spatiotemporal partitioning. The spatiotemporal subnets are then transformed into hypergraphs, which model higher order non-pairwise relationships via hypergraph embedding. Simultaneously, fine-grained amino acid sequence features and coarse-grained gene ontology attributes are introduced for multi-dimensional feature fusion. Finally, protein complexes are identified from the reweighted subnets based on fused feature representations using the core-attachment strategy. Evaluations on four real PPI datasets demonstrate that HGST achieves competitive performance. Furthermore, a series of biological analyses confirm the high biological significance of the complexes identified by HGST. The source code is available at https://github.com/qifen37/HGST.
从蛋白质-蛋白质相互作用(PPI)网络中识别具有生物学意义的蛋白质复合物并了解其作用,对于阐明蛋白质功能、生命过程和疾病机制至关重要。当前的方法通常依赖于静态PPI网络,并将PPI数据建模为成对关系,这存在一些局限性。首先,静态PPI网络不能充分代表蛋白质相互作用的范围和时间动态。其次,大量可用的生物资源尚未得到充分整合。此外,生物系统中的PPI不仅仅是一对一的关系,还涉及更高阶的非成对相互作用。为了缓解这些问题,我们提出了HGST,一种多源生物知识引导的超图时空子网(子网)嵌入方法,用于从PPI网络中识别具有生物学意义的蛋白质复合物。HGST首先利用从多源生物知识中获得的蛋白质范围和时间动态构建时空PPI子网,通过细粒度的时空划分将它们视为动态网络。然后将时空子网转换为超图,通过超图嵌入对更高阶的非成对关系进行建模。同时,引入细粒度的氨基酸序列特征和粗粒度的基因本体属性进行多维度特征融合。最后,基于融合特征表示,使用核心-附件策略从重新加权的子网中识别蛋白质复合物。对四个真实PPI数据集的评估表明,HGST具有竞争力的性能。此外,一系列生物学分析证实了HGST识别出的复合物具有很高的生物学意义。源代码可在https://github.com/qifen37/HGST获取。