Department of Automation, Xiamen University, Xiamen, Fujian, China.
National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
BMC Genomics. 2022 Nov 30;23(1):782. doi: 10.1186/s12864-022-09020-7.
The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data.
We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies.
It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .
基因调控网络(GRNs)的鉴定有助于理解各种生物过程和复杂疾病的潜在分子机制。随着单细胞 RNA 测序数据的可用性,从单细胞表达中推断 GRNs 至关重要。尽管最初为批量表达数据开发的一些 GRN 方法可适用于单细胞数据,并且已经开发了几种单细胞特异性 GRN 算法,但最近的基准研究强调需要开发更准确和稳健的 GRN 建模方法,使其与单细胞表达数据兼容。
我们提出了 SRGS,即基于 SPLS(稀疏偏最小二乘法)的递归基因选择,用于从批量或单细胞表达数据中推断 GRNs。SRGS 基于 SPLS 递归地选择和评分可能对所考虑的靶基因具有调控作用的基因。在处理带有缺失值的基因表达数据时,我们会随机打乱样本,将表达矩阵中的一些值设置为零,并通过多次迭代生成多个数据副本,以使 SRGS 更稳健。我们在不同类型的表达数据上测试了 SRGS,包括模拟的批量数据、无缺失值和有缺失值的模拟单细胞数据,以及实验性的单细胞数据,并且与现有的 GRN 方法进行了比较,包括最初为批量数据开发的方法、专门为单细胞数据开发的方法,甚至是最近基准研究推荐的方法。
结果表明,SRGS 与现有的 GRN 方法具有竞争力,并且在从批量或单细胞基因表达数据中推断基因调控网络方面非常有效。SRGS 可在 https://github.com/JGuan-lab/SRGS 上获得。