Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA.
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
BMC Bioinformatics. 2023 Apr 4;24(1):127. doi: 10.1186/s12859-023-05250-y.
Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states.
To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data.
scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
基因调控网络(GRN)的拓扑结构特征是系统生物学的一个基本问题。单细胞技术的出现使得构建比批量和微阵列数据集更精细分辨率的 GRN 成为可能。然而,细胞异质性和单细胞数据集的稀疏性使得构建 GRN 时不能应用常规的高斯假设。此外,大多数 GRN 重建方法估计整个数据的单个网络。当单细胞数据集来自多个处理条件/疾病状态时,这可能会导致潜在的信息丢失。
为了更好地描述不同但相关条件下的单细胞 GRN,我们提出了使用多个有符号图学习(scMSGL)联合估计多个网络的方法。所提出的方法基于最近开发的基于图信号处理(GSP)的图学习,其中 GRN 和基因表达分别建模为有符号图和图信号。scMSGL 通过优化基因表达相对于 GRN 的总变差来学习多个 GRN,同时通过相对于学习的有符号共识图的正则化来确保学习的 GRN 彼此相似。我们进一步通过选择适合单细胞数据结构的核函数对 scMSGL 进行核化。
在模拟数据集上的 GRN 恢复方面,scMSGL 表现优于现有最先进的方法。此外,scMSGL 在小鼠胚胎干细胞分化研究和髓母细胞瘤的癌症临床研究中成功识别了已确立的调节剂。