Suppr超能文献

一种图形建模方法,用于剖析高度相关的转录因子结合位点图谱。

A graphical modelling approach to the dissection of highly correlated transcription factor binding site profiles.

机构信息

Cambridge Systems Biology Centre, University of Cambridge, Cambridge, United Kingdom.

出版信息

PLoS Comput Biol. 2012;8(11):e1002725. doi: 10.1371/journal.pcbi.1002725. Epub 2012 Nov 8.

Abstract

Inferring the combinatorial regulatory code of transcription factors (TFs) from genome-wide TF binding profiles is challenging. A major reason is that TF binding profiles significantly overlap and are therefore highly correlated. Clustered occurrence of multiple TFs at genomic sites may arise from chromatin accessibility and local cooperation between TFs, or binding sites may simply appear clustered if the profiles are generated from diverse cell populations. Overlaps in TF binding profiles may also result from measurements taken at closely related time intervals. It is thus of great interest to distinguish TFs that directly regulate gene expression from those that are indirectly associated with gene expression. Graphical models, in particular Bayesian networks, provide a powerful mathematical framework to infer different types of dependencies. However, existing methods do not perform well when the features (here: TF binding profiles) are highly correlated, when their association with the biological outcome is weak, and when the sample size is small. Here, we develop a novel computational method, the Neighbourhood Consistent PC (NCPC) algorithms, which deal with these scenarios much more effectively than existing methods do. We further present a novel graphical representation, the Direct Dependence Graph (DDGraph), to better display the complex interactions among variables. NCPC and DDGraph can also be applied to other problems involving highly correlated biological features. Both methods are implemented in the R package ddgraph, available as part of Bioconductor (http://bioconductor.org/packages/2.11/bioc/html/ddgraph.html). Applied to real data, our method identified TFs that specify different classes of cis-regulatory modules (CRMs) in Drosophila mesoderm differentiation. Our analysis also found depletion of the early transcription factor Twist binding at the CRMs regulating expression in visceral and somatic muscle cells at later stages, which suggests a CRM-specific repression mechanism that so far has not been characterised for this class of mesodermal CRMs.

摘要

从全基因组转录因子(TF)结合谱推断转录因子的组合调控代码具有挑战性。一个主要原因是 TF 结合谱显著重叠,因此高度相关。基因组位点上多个 TF 的聚类出现可能是由于染色质可及性和 TF 之间的局部合作,或者如果谱是从不同的细胞群体生成的,则结合位点可能简单地出现聚类。TF 结合谱的重叠也可能是由于在密切相关的时间间隔进行测量所致。因此,区分直接调节基因表达的 TF 和与基因表达间接相关的 TF 非常重要。图形模型,特别是贝叶斯网络,为推断不同类型的依赖性提供了一个强大的数学框架。然而,当特征(这里是 TF 结合谱)高度相关、与生物结果的关联较弱且样本量较小时,现有的方法表现不佳。在这里,我们开发了一种新的计算方法,即邻域一致 PC(NCPC)算法,该算法比现有方法更有效地处理这些情况。我们进一步提出了一种新的图形表示,即直接依赖图(DDGraph),以更好地显示变量之间的复杂相互作用。NCPC 和 DDGraph 也可应用于涉及高度相关生物特征的其他问题。这两种方法都在 R 包 ddgraph 中实现,可作为 Bioconductor 的一部分获得(http://bioconductor.org/packages/2.11/bioc/html/ddgraph.html)。应用于真实数据,我们的方法确定了在果蝇中胚层分化中指定不同类别的顺式调控模块(CRM)的 TF。我们的分析还发现,早期转录因子 Twist 的结合在调节内脏和体细胞肌肉细胞表达的 CRM 中在后期耗竭,这表明存在一种 CRM 特异性抑制机制,迄今为止尚未针对这种类别的中胚层 CRM 进行特征描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88c2/3493460/e4d01f3fa3d9/pcbi.1002725.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验