School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
College of Agronomy and Biotechnology, China Agricultural University, Beijing, China.
BMC Bioinformatics. 2023 Apr 21;24(1):163. doi: 10.1186/s12859-023-05253-9.
Gene regulatory networks (GRNs) arise from the intricate interactions between transcription factors (TFs) and their target genes during the growth and development of organisms. The inference of GRNs can unveil the underlying gene interactions in living systems and facilitate the investigation of the relationship between gene expression patterns and phenotypic traits. Although several machine-learning models have been proposed for inferring GRNs from single-cell RNA sequencing (scRNA-seq) data, some of these models, such as Boolean and tree-based networks, suffer from sensitivity to noise and may encounter difficulties in handling the high noise and dimensionality of actual scRNA-seq data, as well as the sparse nature of gene regulation relationships. Thus, inferring large-scale information from GRNs remains a formidable challenge.
This study proposes a multilevel, multi-structure framework called a pseudo-Siamese GRN (PSGRN) for inferring large-scale GRNs from time-series expression datasets. Based on the pseudo-Siamese network, we applied a gated recurrent unit to capture the time features of each TF and target matrix and learn the spatial features of the matrices after merging by applying the DenseNet framework. Finally, we applied a sigmoid function to evaluate interactions. We constructed two maize sub-datasets, including gene expression levels and GRNs, using existing open-source maize multi-omics data and compared them to other GRN inference methods, including GENIE3, GRNBoost2, nonlinear ordinary differential equations, CNNC, and DGRNS. Our results show that PSGRN outperforms state-of-the-art methods. This study proposed a new framework: a PSGRN that allows GRNs to be inferred from scRNA-seq data, elucidating the temporal and spatial features of TFs and their target genes. The results show the model's robustness and generalization, laying a theoretical foundation for maize genotype-phenotype associations with implications for breeding work.
基因调控网络(GRNs)是在生物的生长和发育过程中,转录因子(TFs)与其靶基因之间复杂相互作用的结果。GRNs 的推断可以揭示活系统中基因相互作用的基础,并有助于研究基因表达模式与表型特征之间的关系。尽管已经提出了几种基于机器学习的模型来从单细胞 RNA 测序(scRNA-seq)数据中推断 GRNs,但其中一些模型,如布尔和基于树的网络,存在对噪声敏感的问题,并且在处理实际 scRNA-seq 数据的高噪声和高维性以及基因调控关系的稀疏性方面可能会遇到困难。因此,从 GRNs 中推断大规模信息仍然是一个艰巨的挑战。
本研究提出了一种称为伪暹罗 GRN(PSGRN)的多层次、多结构框架,用于从时间序列表达数据集推断大规模 GRNs。基于伪暹罗网络,我们应用门控循环单元来捕获每个 TF 和靶矩阵的时间特征,并通过应用 DenseNet 框架来学习矩阵合并后的空间特征。最后,我们应用了一个 sigmoid 函数来评估相互作用。我们使用现有的开源玉米多组学数据构建了两个玉米子数据集,包括基因表达水平和 GRNs,并将其与其他 GRN 推断方法(包括 GENIE3、GRNBoost2、非线性常微分方程、CNNC 和 DGRNS)进行了比较。我们的结果表明,PSGRN 优于最先进的方法。本研究提出了一种新的框架:PSGRN,它允许从 scRNA-seq 数据推断 GRNs,阐明了 TFs 及其靶基因的时间和空间特征。结果表明该模型具有稳健性和泛化性,为玉米基因型-表型关联奠定了理论基础,对育种工作具有重要意义。