利用DNA序列内在特征和细胞类型特异性染色质特征预测转录因子位点占有率。

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

作者信息

Kumar Sunil, Bucher Philipp

机构信息

Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, EPFL, Station 15, Lausanne, CH-1015, Switzerland.

Swiss Institute of Bioinformatics (SIB), EPFL, Station 15, Lausanne, CH-1015, Switzerland.

出版信息

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.

DOI:10.1186/s12859-015-0846-z

PMID:26818008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4895346/

Abstract

BACKGROUND

Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Recent reports show that chromatin accessibility, nucleosome occupancy and specific histone post-translational modifications greatly influence TF site occupancy in vivo. In this work, we use machine-learning methods to build predictive models and assess the relative importance of different sequence-intrinsic and chromatin features in the TF-to-target-site recruitment process.

METHODS

Our study primarily relies on recent data published by the ENCODE consortium. Five dissimilar TFs assayed in multiple cell-types were selected as examples: CTCF, JunD, REST, GABP and USF2. We used two types of candidate target sites: (a) predicted sites obtained by scanning the whole genome with a position weight matrix, and (b) cell-type specific peak lists provided by ENCODE. Quantitative in vivo occupancy levels in different cell-types were based on ChIP-seq data for the corresponding TFs. In parallel, we computed a number of associated sequence-intrinsic and experimental features (histone modification, DNase I hypersensitivity, etc.) for each site. Machine learning algorithms were then used in a binary classification and regression framework to predict site occupancy and binding strength, for the purpose of assessing the relative importance of different contextual features.

RESULTS

We observed striking differences in the feature importance rankings between the five factors tested. PWM-scores were amongst the most important features only for CTCF and REST but of little value for JunD and USF2. Chromatin accessibility and active histone marks are potent predictors for all factors except REST. Structural DNA parameters, repressive and gene body associated histone marks are generally of little or no predictive value.

CONCLUSIONS

We define a general and extensible computational framework for analyzing the importance of various DNA-intrinsic and chromatin-associated features in determining cell-type specific TF binding to target sites. The application of our methodology to ENCODE data has led to new insights on transcription regulatory processes and may serve as example for future studies encompassing even larger datasets.

摘要

背景

了解转录因子（TF）被招募到其生理靶位点的机制对于理解基因调控至关重要。DNA序列的内在特征，如预测的结合亲和力，在预测体内位点占有率方面往往不是很有效，而且在任何情况下都无法解释细胞类型特异性结合事件。最近的报告表明，染色质可及性、核小体占有率和特定的组蛋白翻译后修饰在很大程度上影响TF在体内的位点占有率。在这项工作中，我们使用机器学习方法构建预测模型，并评估不同序列内在特征和染色质特征在TF到靶位点招募过程中的相对重要性。

方法

我们的研究主要依赖于ENCODE联盟最近发布的数据。选择了在多种细胞类型中检测的五种不同的TF作为示例：CTCF、JunD、REST、GABP和USF2。我们使用了两种类型的候选靶位点：（a）通过用位置权重矩阵扫描全基因组获得的预测位点，以及（b）ENCODE提供的细胞类型特异性峰列表。不同细胞类型中的定量体内占有率水平基于相应TF的ChIP-seq数据。同时，我们为每个位点计算了许多相关的序列内在特征和实验特征（组蛋白修饰、DNase I超敏反应等）。然后，在二元分类和回归框架中使用机器学习算法来预测位点占有率和结合强度，以评估不同背景特征的相对重要性。

结果

我们观察到所测试的五个因子之间在特征重要性排名上存在显著差异。PWM分数仅是CTCF和REST最重要的特征之一，对JunD和USF2几乎没有价值。除REST外，染色质可及性和活性组蛋白标记是所有因子的有效预测指标。结构DNA参数、抑制性和基因体相关的组蛋白标记通常几乎没有或没有预测价值。

结论

我们定义了一个通用且可扩展的计算框架，用于分析各种DNA内在特征和染色质相关特征在确定细胞类型特异性TF与靶位点结合中的重要性。我们的方法应用于ENCODE数据，为转录调控过程带来了新的见解，并可能为未来包含更大数据集的研究提供示例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c76/4895346/27ba1d2f8ce0/12859_2015_846_Fig1_HTML.jpg

相似文献

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.

Modeling co-occupancy of transcription factors using chromatin features.

Nucleic Acids Res. 2016 Mar 18;44(5):e49. doi: 10.1093/nar/gkv1281. Epub 2015 Nov 20.

Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility.

BMC Bioinformatics. 2017 Jul 27;18(1):355. doi: 10.1186/s12859-017-1769-7.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.

Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local.

PLoS Comput Biol. 2010 Jan 22;6(1):e1000649. doi: 10.1371/journal.pcbi.1000649.

Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

Nucleic Acids Res. 2017 Jan 9;45(1):54-66. doi: 10.1093/nar/gkw1061. Epub 2016 Nov 29.

Cell-type specificity of ChIP-predicted transcription factor binding sites.

BMC Genomics. 2012 Aug 3;13:372. doi: 10.1186/1471-2164-13-372.

Sequence and chromatin determinants of cell-type-specific transcription factor binding.

Genome Res. 2012 Sep;22(9):1723-34. doi: 10.1101/gr.127712.111.

Nucleosome organization in the vicinity of transcription factor binding sites in the human genome.

BMC Genomics. 2014 Jun 19;15(1):493. doi: 10.1186/1471-2164-15-493.

引用本文的文献

An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa.

Plants (Basel). 2022 Dec 23;12(1):71. doi: 10.3390/plants12010071.

Role of primary aging hallmarks in Alzheimer´s disease.

Theranostics. 2023 Jan 1;13(1):197-230. doi: 10.7150/thno.79535. eCollection 2023.

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model.

Genes (Basel). 2022 Mar 23;13(4):568. doi: 10.3390/genes13040568.

An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding.

Genome Biol. 2021 Jan 7;22(1):20. doi: 10.1186/s13059-020-02218-6.

Evidence of widespread, independent sequence signature for transcription factor cobinding.

Genome Res. 2021 Feb;31(2):265-278. doi: 10.1101/gr.267310.120. Epub 2020 Dec 10.

Learning and interpreting the gene regulatory grammar in a deep learning framework.

PLoS Comput Biol. 2020 Nov 2;16(11):e1008334. doi: 10.1371/journal.pcbi.1008334. eCollection 2020 Nov.

The CCCTC-binding factor CTCF represses hepatitis B virus enhancer I and regulates viral transcription.

Cell Microbiol. 2021 Feb;23(2):e13274. doi: 10.1111/cmi.13274. Epub 2020 Oct 16.

Using Machine-Learning Algorithms for Eutrophication Modeling: Case Study of Mar Menor Lagoon (Spain).

Int J Environ Res Public Health. 2020 Feb 13;17(4):1189. doi: 10.3390/ijerph17041189.

Homotypic cooperativity and collective binding are determinants of bHLH specificity and function.

Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):16143-16152. doi: 10.1073/pnas.1818015116. Epub 2019 Jul 24.

Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.

Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425.

本文引用的文献

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

Modeling the relationship of epigenetic modifications to transcription factor binding.

Nucleic Acids Res. 2015 Apr 30;43(8):3873-85. doi: 10.1093/nar/gkv255. Epub 2015 Mar 27.

The UCSC Genome Browser database: 2015 update.

Nucleic Acids Res. 2015 Jan;43(Database issue):D670-81. doi: 10.1093/nar/gku1177. Epub 2014 Nov 26.

The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools.

Nucleic Acids Res. 2015 Jan;43(Database issue):D92-6. doi: 10.1093/nar/gku1111. Epub 2014 Nov 6.

Modeling the specificity of protein-DNA interactions.

Quant Biol. 2013 Jun;1(2):115-130. doi: 10.1007/s40484-013-0012-4.

Probabilistic partitioning methods to find significant patterns in ChIP-Seq data.

Bioinformatics. 2014 Sep 1;30(17):2406-13. doi: 10.1093/bioinformatics/btu318. Epub 2014 May 7.

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.

Nucleic Acids Res. 2014 Jan;42(Database issue):D142-7. doi: 10.1093/nar/gkt997. Epub 2013 Nov 4.

CTCF and cohesin: linking gene regulatory elements with their targets.

Cell. 2013 Mar 14;152(6):1285-97. doi: 10.1016/j.cell.2013.02.029.

Using DNase digestion data to accurately identify transcription factor binding sites.

Pac Symp Biocomput. 2013:80-91.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用DNA序列内在特征和细胞类型特异性染色质特征预测转录因子位点占有率。

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献