School of Computer Science and Technology, Donghua University, Shanghai 201620, China.
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae291.
Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships.
In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets.
Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
基因调控网络(GRNs)编码了生物体内的基因调控,已成为理解复杂生物过程的重要工具。然而,由于基因调控的动态和复杂性质,从 scRNA-seq 数据中推断 GRNs 仍然是一项具有挑战性的任务。现有的计算方法通常侧重于基因之间的紧密连接,而忽略了全局结构和远程调控关系。
在这项研究中,我们开发了一个基于图嵌入的有监督深度学习框架 IGEGRNS,用于从 scRNA-seq 数据中推断 GRNs。在该框架中,通过 GraphSAGE 捕获基因的上下文信息,该方法通过聚合基因特征和邻域结构,为基因生成低维嵌入。然后,通过 Top-k 池化过滤整个图中 k 个最有影响力的节点。最后,通过堆叠 CNNs 预测基因之间的潜在调控关系。与九种竞争的有监督和无监督方法相比,我们的方法在六个时间序列 scRNA-seq 数据集上取得了更好的性能。
我们的方法 IGEGRNS 是使用 Python 实现的,使用了 Pytorch 机器学习库,并在 https://github.com/DHUDBlab/IGEGRNS 上免费提供。