School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China.
Interdiscip Sci. 2024 Dec;16(4):990-1004. doi: 10.1007/s12539-024-00633-y. Epub 2024 May 23.
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
基于单细胞 RNA 测序数据 (scRNAseq) 的基因调控网络 (GRN) 推断在理解基因之间的调控机制方面起着至关重要的作用。已经采用了各种计算方法进行 GRN 推断,但它们在网络准确性和模型泛化方面的性能并不令人满意,其性能不佳是由高维数据和网络稀疏性引起的。在本文中,我们提出了一种使用单细胞 RNA 测序数据 (CVGAE) 进行基因调控网络推断的自监督方法。CVGAE 使用图神经网络进行归纳表示学习,将基因表达数据和观察到的拓扑结构合并到一个低维向量空间中。经过良好训练的向量将用于计算每个基因的数学距离,并进一步预测基因之间的相互作用。在整体框架中,FastICA 用于缓解高维数据引起的计算复杂性,CVGAE 采用多层 GraphSAGE 层作为编码器和改进的解码器,以克服网络稀疏性。CVGAE 在包含四个相关真实网络的几个单细胞数据集上进行了评估,结果表明 CVGAE 比比较方法具有更好的性能。为了验证学习和泛化能力,CVGAE 通过改变训练集和测试集的比例应用于小样本环境。在小样本条件下,CVGAE 获得了可比或更优的性能。