Siahpirani Alireza Fotuhi, McCalla Sunnie Grace, Pyne Saptarshi, Dillingham Caleb, Sridharan Rupa, Roy Sushmita
Wisconsin Institute for Discovery, University of Wisconsin-Madison.
Department of Computer Sciences, University of Wisconsin-Madison.
bioRxiv. 2025 Jun 13:2025.06.09.658650. doi: 10.1101/2025.06.09.658650.
Reconstructing genome-scale gene regulatory networks (GRNs) remains a difficult problem in systems biology, and many experimental and computational methods have been developed to address this problem. Recent computational methods have aimed to more accurately model GRNs by estimating the hidden Transcription Factor Activity (TFA), from prior knowledge of TF target regulatory connections, encoded as an input directed graph, to relax the assumption that mRNA level of the regulator correlates with the protein activity of the regulator. However, the noise in the prior knowledge can adversely affect the estimated TFA levels and the quality of the downstream inferred GRNs. Here, we present a new approach, MERLIN+P+TFA, that uses prior knowledge-guided sparsity regularization to robustly and accurately estimate TFA and downstream GRNs. We apply our method to simulated and real expression data in yeast and mammalian systems and show improved quality of inferred GRNs for both bulk and single-cell datasets. Regularized TFA offers benefits to a variety of other GRN inference algorithms, including those that have traditionally be used with expression alone, in both bulk and scRNA-seq settings. We used the inferred GRN to prioritize key regulators for the mouse Embryonic Stem Cell (mESC) state and validate 58 regulators experimentally. We identify both known and novel regulators of the mESC state and further validate the targets of 4 known and novel regulators. Our validation experiments suggest that computationally inferred networks can capture functional targets of TFs with higher precision than estimated in current benchmarks, however, it is important to generate context-specific gold standards.
重建基因组规模的基因调控网络(GRNs)仍然是系统生物学中的一个难题,人们已经开发了许多实验和计算方法来解决这个问题。最近的计算方法旨在通过从转录因子(TF)靶标调控连接的先验知识(编码为输入有向图)中估计隐藏的转录因子活性(TFA),来更准确地对基因调控网络进行建模,以放宽调节因子的mRNA水平与调节因子的蛋白质活性相关的假设。然而,先验知识中的噪声可能会对估计的TFA水平和下游推断的基因调控网络的质量产生不利影响。在这里,我们提出了一种新方法MERLIN+P+TFA,它使用先验知识引导的稀疏正则化来稳健且准确地估计TFA和下游基因调控网络。我们将我们的方法应用于酵母和哺乳动物系统中的模拟和真实表达数据,并表明对于批量和单细胞数据集,推断的基因调控网络的质量都有所提高。正则化的TFA对多种其他基因调控网络推理算法都有好处,包括那些传统上仅与表达一起使用的算法,无论是在批量还是单细胞RNA测序设置中。我们使用推断的基因调控网络对小鼠胚胎干细胞(mESC)状态的关键调节因子进行优先级排序,并通过实验验证了58个调节因子。我们确定了mESC状态的已知和新型调节因子,并进一步验证了4个已知和新型调节因子的靶标。我们的验证实验表明,通过计算推断的网络可以比当前基准中估计的更精确地捕获转录因子的功能靶标,然而,生成特定于上下文的金标准很重要。