Zandigohar Mehrdad, Rehman Jalees, Dai Yang
Department of Biomedical Engineering, University of Illinois Chicago, Chicago, Illinois, United States.
Department of Biochemistry and Molecular Genetics, University of Illinois College of Medicine, Chicago, Illinois, United States.
bioRxiv. 2025 May 5:2025.04.17.649372. doi: 10.1101/2025.04.17.649372.
Accurately inferring transcription factor (TF) activity from single-cell RNA sequencing (scRNA-seq) data remains a fundamental challenge in computational biology. While existing methods rely on statistical models, motif enrichment, or prior-based inference, they often depend on deterministic assumptions about regulatory relationships and rely on static regulatory databases. Moreover, few approaches can effectively integrate prior biological knowledge with data-driven inference to capture novel, dynamic, and context-specific regulatory interactions.
To address these limitations, we develop scRegulate, a generative deep learning framework that leverages variational inference to infer TF activities while incorporating gene regulatory network (GRN) priors. By integrating structured biological constraints with a probabilistic latent space model, scRegulate offers a scalable and biologically interpretable solution for prediction of regulatory interactions from scRNA-seq data. We comprehensively benchmark scRegulate using multiple public experimental and synthetic datasets generated from GRouNdGAN to demonstrate its ability to infer TF activities and GRNs that are consistent with the underlying ground-truth regulatory interactions. scRegulate outperforms existing TF inference methods, achieving AUROC values of 0.71-0.86 and AUPRC values of 0.80-0.95 on three synthetic datasets. Additionally, scRegulate accurately recapitulates experimentally validated TF knockdown effects on a Perturb-seq dataset, achieving a mean log2 fold change of -0.61 to -18.92 (p ≤ 8.06×10) for key TFs such as ELK1, EGR1, and CREB1. Applied to the PBMC scRNA-seq data, scRegulate reconstructs cell-type-specific GRNs and identifies differentially active TFs that align with known immune regulatory pathways. Furthermore, we show that scRegulate's TF embeddings capture meaningful transcriptional heterogeneity, enabling accurate clustering of cell types. Collectively, our results establish scRegulate as a powerful, interpretable, and scalable framework for inferring TF activities and regulatory networks from single-cell transcriptomics.
从单细胞RNA测序(scRNA-seq)数据中准确推断转录因子(TF)活性仍然是计算生物学中的一项基本挑战。虽然现有方法依赖于统计模型、基序富集或基于先验的推断,但它们通常依赖于关于调控关系的确定性假设,并依赖于静态调控数据库。此外,很少有方法能够有效地将先验生物学知识与数据驱动的推断相结合,以捕捉新的、动态的和特定于上下文的调控相互作用。
为了解决这些局限性,我们开发了scRegulate,这是一个生成式深度学习框架,它利用变分推断来推断TF活性,同时纳入基因调控网络(GRN)先验。通过将结构化生物学约束与概率潜在空间模型相结合,scRegulate为从scRNA-seq数据预测调控相互作用提供了一种可扩展且具有生物学可解释性的解决方案。我们使用从GRouNdGAN生成的多个公共实验和合成数据集对scRegulate进行了全面基准测试,以证明其推断与潜在真实调控相互作用一致的TF活性和GRN的能力。scRegulate优于现有的TF推断方法,在三个合成数据集上的AUROC值为0.71 - 0.86,AUPRC值为0.80 - 0.95。此外,scRegulate准确地概括了在Perturb-seq数据集上经过实验验证的TF敲低效应,对于ELK1、EGR1和CREB1等关键TF,平均log2倍数变化为 - 0.61至 - 18.92(p≤8.06×10)。应用于PBMC scRNA-seq数据时,scRegulate重建了细胞类型特异性GRN,并识别出与已知免疫调控途径一致的差异活跃TF。此外,我们表明scRegulate的TF嵌入捕获了有意义的转录异质性,能够对细胞类型进行准确聚类。总体而言,我们的结果确立了scRegulate作为一个强大、可解释且可扩展的框架,用于从单细胞转录组学推断TF活性和调控网络。