通过对大量染色质免疫沉淀数据集进行综合分析，从头预测顺式调控元件和模块。

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.

作者信息

Niu Meng, Tabari Ehsan S, Su Zhengchang

机构信息

Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USA.

出版信息

BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.

DOI:10.1186/1471-2164-15-1047

PMID:25442502

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4265420/

Abstract

BACKGROUND

In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.

RESULTS

We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.

CONCLUSION

Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.

摘要

背景

在真核生物中，转录调控通常是由多种转录因子（TFs）与DNA中所谓的顺式调控模块（CRMs）内各自特定的顺式调控元件（CREs）相互作用介导的。虽然基因组中CREs和CRMs的知识对于阐明基因调控网络和理解许多重要的生物学现象至关重要，但由于通过计算或传统实验方法表征它们存在困难，大多数真核生物基因组中的CREs和CRMs仍知之甚少。然而，最近广泛采用的染色质免疫沉淀结合微阵列杂交（ChIP-chip）或高通量测序（ChIP-seq）技术产生的TF结合位点数据数量呈指数级增长，为识别基因组中的CRMs和CREs提供了前所未有的机会。尽管如此，如何有效地挖掘这些大量的ChIP数据以在核苷酸分辨率下识别CREs和CRMs是一项极具挑战性的任务。

结果

我们开发了一种基于图论的新型算法DePCRM，用于使用大量ChIP数据集对CREs和CRMs进行全基因组从头预测。DePCRM通过有效识别多个ChIP数据集中过度富集的组合CRE基序模式来预测CREs和CRMs。当应用于来自黑腹果蝇的56个TF的168个ChIP数据集时，DePCRM分别识别出184个和746个过度富集的CRE基序及其组合模式，并在基因组中总共预测了115,932个CRMs。这些预测在数据集中恢复了77.9%的已知CRMs以及包含至少一个预测CRE的已知CRMs的89.3%。我们发现，一个CRM中的推定CRMs以及整个CREs比随机选择的序列更保守。

结论

我们的结果表明，DePCRM预测的CRMs极有可能是有功能的。我们的算法是首个使用大量转录因子ChIP数据集对CREs和CRMs进行全基因组从头预测的算法。该算法和预测有望促进真核生物基因调控网络的阐明。所有预测的CREs、CRMs及其靶基因可在http://bioinfo.uncc.edu/mniu/pcrms/www/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41b5/4265420/d943011f67fa/12864_2014_6723_Fig1_HTML.jpg

相似文献

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.通过对大量染色质免疫沉淀数据集进行综合分析，从头预测顺式调控元件和模块。

BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.

Towards a map of cis-regulatory sequences in the human genome.构建人类基因组顺式调控序列图谱

Nucleic Acids Res. 2018 Jun 20;46(11):5395-5409. doi: 10.1093/nar/gky338.

Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA.通过结合DNA的比较分析和组成分析来识别顺式调控模块。

Bioinformatics. 2006 Dec 1;22(23):2858-64. doi: 10.1093/bioinformatics/btl499. Epub 2006 Oct 10.

cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila.顺式MEP：果蝇基因组表观遗传图谱和顺式调控模块的综合数据库。

BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S8. doi: 10.1186/1752-0509-8-S4-S8. Epub 2014 Dec 8.

A novel pairwise comparison method for in silico discovery of statistically significant cis-regulatory elements in eukaryotic promoter regions: application to Arabidopsis.一种用于在真核生物启动子区域进行计算机模拟发现具有统计学意义的顺式调控元件的新型成对比较方法：应用于拟南芥。

J Theor Biol. 2015 Jan 7;364:364-76. doi: 10.1016/j.jtbi.2014.09.038. Epub 2014 Oct 7.

PCRMS: a database of predicted cis-regulatory modules and constituent transcription factor binding sites in genomes.PCRMS：基因组中预测的顺式调控模块和组成转录因子结合位点数据库。

Database (Oxford). 2022 Apr 22;2022. doi: 10.1093/database/baac024.

CisMiner: genome-wide in-silico cis-regulatory module prediction by fuzzy itemset mining.CisMiner：通过模糊项集挖掘进行全基因组的计算机模拟顺式调控模块预测

PLoS One. 2014 Sep 30;9(9):e108065. doi: 10.1371/journal.pone.0108065. eCollection 2014.

Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves.利用频率分布曲线，通过真核生物基因组中的位置不平衡，有效揭示了转录和翻译起始位点上游的顺式基序。

BMC Bioinformatics. 2006 Nov 30;7:522. doi: 10.1186/1471-2105-7-522.

A graph-based approach to systematically reconstruct human transcriptional regulatory modules.一种基于图形的方法来系统地重建人类转录调控模块。

Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227.

COPS: detecting co-occurrence and spatial arrangement of transcription factor binding motifs in genome-wide datasets.COPS：在全基因组数据集中检测转录因子结合基序的共现和空间排列。

PLoS One. 2012;7(12):e52055. doi: 10.1371/journal.pone.0052055. Epub 2012 Dec 18.

引用本文的文献

regCNN: identifying genome-wide -regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs.regCNN：通过整合表观遗传标记中的局部模式和转录因子结合基序来识别全基因组调控模块。

Comput Struct Biotechnol J. 2021 Dec 18;20:296-308. doi: 10.1016/j.csbj.2021.12.015. eCollection 2022.

Accurate prediction of -regulatory modules reveals a prevalent regulatory genome of humans.对调控模块的准确预测揭示了人类普遍存在的调控基因组。

NAR Genom Bioinform. 2021 Jun 17;3(2):lqab052. doi: 10.1093/nargab/lqab052. eCollection 2021 Jun.

FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.FisherMP：一种用于从大型 ChIP-seq 数据集中检测组合基序的完全并行算法。

DNA Res. 2019 Jun 1;26(3):231-242. doi: 10.1093/dnares/dsz004.

REDfly: the transcriptional regulatory element database for Drosophila.REDfly：果蝇转录调控元件数据库。

Nucleic Acids Res. 2019 Jan 8;47(D1):D828-D834. doi: 10.1093/nar/gky957.

Towards a map of cis-regulatory sequences in the human genome.构建人类基因组顺式调控序列图谱

Nucleic Acids Res. 2018 Jun 20;46(11):5395-5409. doi: 10.1093/nar/gky338.

Modeling the -regulatory modules of genes expressed in developmental stages of .对在[具体物种]发育阶段表达的基因的[具体基因相关的调控模块，此处-regulatory 具体含义不明]进行建模。

PeerJ. 2017 May 30;5:e3389. doi: 10.7717/peerj.3389. eCollection 2017.

Maps of context-dependent putative regulatory regions and genomic signal interactions.上下文相关的假定调控区域和基因组信号相互作用图谱。

Nucleic Acids Res. 2016 Nov 2;44(19):9110-9120. doi: 10.1093/nar/gkw800. Epub 2016 Sep 12.

CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.CLIMP：通过具有并行计算设计的最大团进行基序聚类

PLoS One. 2016 Aug 3;11(8):e0160435. doi: 10.1371/journal.pone.0160435. eCollection 2016.

本文引用的文献

A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.一个关于 motif 发现网络工具的调查，用于检测 ChIP-Seq 数据中的结合位点 motif。

Biol Direct. 2014 Feb 20;9:4. doi: 10.1186/1745-6150-9-4.

Extensive variation in chromatin states across humans.人类染色质状态的广泛变异。

Science. 2013 Nov 8;342(6159):750-2. doi: 10.1126/science.1242510. Epub 2013 Oct 17.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

Human expression QTLs are enriched in signals of environmental adaptation.人类表达数量性状基因座富集了环境适应的信号。

Genome Biol Evol. 2013;5(9):1689-701. doi: 10.1093/gbe/evt124.

Impacts of variation in the human genome on gene regulation.人类基因组变异对基因调控的影响。

J Mol Biol. 2013 Nov 1;425(21):3970-7. doi: 10.1016/j.jmb.2013.07.015. Epub 2013 Jul 16.

Gene expression drives local adaptation in humans.基因表达驱动人类的局部适应。

Genome Res. 2013 Jul;23(7):1089-96. doi: 10.1101/gr.152710.112. Epub 2013 Mar 28.

Genome-wide chromatin state transitions associated with developmental and environmental cues.与发育和环境线索相关的全基因组染色质状态转变。

Cell. 2013 Jan 31;152(3):642-54. doi: 10.1016/j.cell.2012.12.033. Epub 2013 Jan 17.

DNA-binding specificities of human transcription factors.人类转录因子的 DNA 结合特异性。

Cell. 2013 Jan 17;152(1-2):327-39. doi: 10.1016/j.cell.2012.12.009.

Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium.Factorbook.org：一个基于维基的转录因子结合数据数据库，由 ENCODE 联盟生成。

Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. doi: 10.1093/nar/gks1221. Epub 2012 Nov 29.

Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors.119 个人类转录因子结合的基因组区域的序列特征和染色质结构。

Genome Res. 2012 Sep;22(9):1798-812. doi: 10.1101/gr.139105.112.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过对大量染色质免疫沉淀数据集进行综合分析，从头预测顺式调控元件和模块。

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献