Salas Elisa N, Shu Jiang, Cserhati Matyas F, Weeks Donald P, Ladunga Istvan
Department of Statistics, University of Nebraska, Lincoln, NE 68583-0963, USA Department of Biochemistry, University of Nebraska, Lincoln, NE 68588-0665, USA.
Department of Statistics, University of Nebraska, Lincoln, NE 68583-0963, USA.
Nucleic Acids Res. 2016 Jun 2;44(10):4595-609. doi: 10.1093/nar/gkw042. Epub 2016 Jan 28.
We present a theory of pluralistic and stochastic gene regulation. To bridge the gap between empirical studies and mathematical models, we integrate pre-existing observations with our meta-analyses of the ENCODE ChIP-Seq experiments. Earlier evidence includes fluctuations in levels, location, activity, and binding of transcription factors, variable DNA motifs, and bursts in gene expression. Stochastic regulation is also indicated by frequently subdued effects of knockout mutants of regulators, their evolutionary losses/gains and massive rewiring of regulatory sites. We report wide-spread pluralistic regulation in ≈800 000 tightly co-expressed pairs of diverse human genes. Typically, half of ≈50 observed regulators bind to both genes reproducibly, twice more than in independently expressed gene pairs. We also examine the largest set of co-expressed genes, which code for cytoplasmic ribosomal proteins. Numerous regulatory complexes are highly significant enriched in ribosomal genes compared to highly expressed non-ribosomal genes. We could not find any DNA-associated, strict sense master regulator. Despite major fluctuations in transcription factor binding, our machine learning model accurately predicted transcript levels using binding sites of 20+ regulators. Our pluralistic and stochastic theory is consistent with partially random binding patterns, redundancy, stochastic regulator binding, burst-like expression, degeneracy of binding motifs and massive regulatory rewiring during evolution.
我们提出了一种多元随机基因调控理论。为弥合实证研究与数学模型之间的差距,我们将先前已有的观察结果与我们对ENCODE染色质免疫沉淀测序(ChIP-Seq)实验的荟萃分析相结合。早期证据包括转录因子水平、位置、活性和结合的波动、可变的DNA基序以及基因表达的爆发。随机调控还体现在调控因子基因敲除突变体的效应常常减弱、它们在进化中的丢失/获得以及调控位点的大量重新布线。我们报告了在约80万对紧密共表达的不同人类基因中广泛存在的多元调控。通常,在约50个观察到的调控因子中,有一半可重复地与两个基因结合,这一比例是独立表达基因对中的两倍多。我们还研究了最大的一组共表达基因,它们编码细胞质核糖体蛋白。与高表达的非核糖体基因相比,众多调控复合物在核糖体基因中高度显著富集。我们未发现任何与DNA相关的、严格意义上的主调控因子。尽管转录因子结合存在较大波动,但我们的机器学习模型利用20多种调控因子的结合位点准确预测了转录水平。我们的多元随机理论与部分随机的结合模式、冗余性、随机调控因子结合、爆发式表达、结合基序的简并性以及进化过程中的大量调控重新布线相一致。