Zhou Jian, Troyanskaya Olga G
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America; Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America.
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America; Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.
PLoS Comput Biol. 2014 Mar 27;10(3):e1003525. doi: 10.1371/journal.pcbi.1003525. eCollection 2014 Mar.
Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the "chromatin codes") remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles--we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions.
染色质是基因调控的驱动因素,但了解染色质因子组合模式(即“染色质密码”)背后的分子相互作用仍然是染色质生物学中的一项基本挑战。在此,我们开发了一个全局建模框架,该框架利用染色质分析数据来生成染色质大分子复合物的系统级视图。我们的模型利用基于正则化结构学习的最大熵建模,从统计学角度剖析染色质因子之间的依赖性,并生成染色质密码的精确概率分布。我们的无监督定量模型以来自modENCODE的73种组蛋白标记和染色质蛋白的全基因组染色质图谱进行训练,能够对染色质图谱和相互作用进行各种数据驱动的推断。我们提供了一个经已知实验证据验证的染色质因子成对相互作用的高精度预测器,并首次实现了高阶相互作用预测。因此,我们的预测有助于指导未来的实验研究。该模型还可作为预测未知染色质图谱的推理引擎——我们证明,通过这种方法,我们可以利用来自特征明确的细胞类型的数据,来帮助理解研究较少的细胞类型或条件。