Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.
Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.
Bioinformatics. 2022 May 26;38(11):3011-3019. doi: 10.1093/bioinformatics/btac288.
Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets.
To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets.
The scSGL code and analysis scripts are available on https://github.com/Single-Cell-Graph-Learning/scSGL.
Supplementary data are available at Bioinformatics online.
从大型单细胞 RNA 测序数据集阐明基因调控网络 (GRN) 的拓扑结构,同时有效地捕捉其内在的细胞周期异质性和缺失,是计算系统生物学中最紧迫的问题之一。最近,基于图信号处理的图学习 (GL) 方法已经被开发出来,以便从图上定义的信号推断图的拓扑结构。然而,现有的 GL 方法不适合学习有向图,这是 GRN 的一个特征,有向图能够在基因网络中考虑激活和抑制关系。它们也无法处理单细胞数据集中原先存在的大量零值。
为此,我们提出了一种新的有向 GL 方法 scSGL,该方法基于基因表达在激活和抑制边缘上的平滑性和非平滑性假设来学习 GRN。然后,通过核函数扩展 scSGL 来考虑共表达的非线性以及有效处理高度出现的零值。所提出的方法被公式化为一个非凸优化问题,并使用有效的 ADMM 框架进行求解。使用模拟数据集进行的性能评估表明,核 scSGL 在 GRN 恢复方面优于现有的最先进方法。使用人类和小鼠胚胎数据集进一步研究了 scSGL 的性能。
scSGL 代码和分析脚本可在 https://github.com/Single-Cell-Graph-Learning/scSGL 上获得。
补充数据可在生物信息学在线获得。