在DNA序列排名列表中发现基序。

Discovering motifs in ranked lists of DNA sequences.

作者信息

Eden Eran, Lipson Doron, Yogev Sivan, Yakhini Zohar

机构信息

Computer Science Department, Technion, Haifa, Israel.

出版信息

PLoS Comput Biol. 2007 Mar 23;3(3):e39. doi: 10.1371/journal.pcbi.0030039.

DOI:10.1371/journal.pcbi.0030039

PMID:17381235

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1829477/

Abstract

Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.

摘要

与背景集相比，发现目标集中富集的序列元件的计算方法是分子生物学研究的基础。一个例子是从ChIP芯片（微阵列上的染色质免疫沉淀）测量中推断出的转录因子结合基序的发现。序列基序发现中的几个主要挑战仍需考虑：（i）需要一种有原则的方法将数据划分为目标集和背景集；（ii）缺乏用于测量基序富集的严格模型和精确的p值；（iii）需要一个适当的框架来考虑基序的多重性；（iv）在许多现有方法中，即使应用于随机生成的数据，也倾向于报告可能显著的基序。在本文中，我们提出了一个统计框架，用于在排序列表中发现富集的序列元件，该框架解决了这四个问题。我们展示了该框架在一个名为DRIM（排名不平衡基序发现）的软件应用程序中的实现，该程序可识别排名DNA序列列表中的序列基序。我们将DRIM应用于ChIP芯片和CpG甲基化数据，并获得了以下结果。（i）在酵母ChIP芯片数据中鉴定出50个新的推定转录因子（TF）结合位点。对其中一些结合位点的生物学功能进行了进一步研究，以获得对酵母转录调控网络的新见解。例如，我们的发现有助于阐明TF ARO80的网络。另一个发现涉及TF与含有CA重复序列的序列的系统性结合增强。（ii）在人类癌症CpG甲基化数据中发现新的基序。值得注意的是，这些基序中的大多数与促进组蛋白甲基化的多梳复合体结合的DNA序列元件相似。因此，我们的发现支持了一种组蛋白甲基化和CpG甲基化在机制上相关的模型。总体而言，我们证明了DRIM软件工具中体现的统计框架在识别从表达和ChIP芯片到CpG甲基化数据等各种应用中的调控序列元件方面非常有效。DRIM可在http://bioinfo.cs.technion.ac.il/drim上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e44c/1847989/794f8b071167/pcbi.0030039.g001.jpg

相似文献

Discovering motifs in ranked lists of DNA sequences.

PLoS Comput Biol. 2007 Mar 23;3(3):e39. doi: 10.1371/journal.pcbi.0030039.

De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.

BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.

Localized motif discovery in gene regulatory sequences.

Bioinformatics. 2010 May 1;26(9):1152-9. doi: 10.1093/bioinformatics/btq106. Epub 2010 Mar 11.

Using PhyloCon to identify conserved regulatory motifs.

Curr Protoc Bioinformatics. 2007 Sep;Chapter 2:Unit 2.12. doi: 10.1002/0471250953.bi0212s19.

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm.

BMC Bioinformatics. 2007 Oct 15;8:385. doi: 10.1186/1471-2105-8-385.

Learning position weight matrices from sequence and expression data.

Comput Syst Bioinformatics Conf. 2007;6:249-60.

W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data.

Bioinformatics. 2008 May 1;24(9):1121-8. doi: 10.1093/bioinformatics/btn088. Epub 2008 Mar 5.

Discovering Gene Regulatory Elements Using Coverage-Based Heuristics.

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1290-1300. doi: 10.1109/TCBB.2015.2496261. Epub 2015 Oct 30.

A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data.

Bioinformatics. 2006 Feb 15;22(4):423-9. doi: 10.1093/bioinformatics/bti815. Epub 2005 Dec 6.

PEAKS: identification of regulatory motifs by their position in DNA sequences.

Bioinformatics. 2007 Jan 15;23(2):243-4. doi: 10.1093/bioinformatics/btl568. Epub 2006 Nov 10.

引用本文的文献

Pooled CRISPR screens identifies key regulators of bovine stem cell expansion for cultured meat.

Commun Biol. 2025 Aug 30;8(1):1313. doi: 10.1038/s42003-025-08760-y.

RNA-interactome capture identifies SRSF3 as a key protein for herpesviral gene expression.

PNAS Nexus. 2025 Aug 7;4(8):pgaf225. doi: 10.1093/pnasnexus/pgaf225. eCollection 2025 Aug.

Epigenetic Changes Regulating Epithelial-Mesenchymal Plasticity in Human Trophoblast Differentiation.

Cells. 2025 Jun 24;14(13):970. doi: 10.3390/cells14130970.

Genomic Anomaly Detection with Functional Data Analysis.

Genes (Basel). 2025 Jun 15;16(6):710. doi: 10.3390/genes16060710.

Application of genomic tools to study and potentially improve the upper thermal tolerance of farmed Atlantic salmon (Salmo salar).

BMC Genomics. 2025 Mar 24;26(1):294. doi: 10.1186/s12864-025-11482-4.

Learning genotype-phenotype associations from gaps in multi-species sequence alignments.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf022.

The neuronal and glial cell diversity in the celiac ganglion revealed by single-nucleus RNA sequencing.

Sci Rep. 2025 Feb 14;15(1):5510. doi: 10.1038/s41598-025-89779-3.

Transcriptomic analysis identifies muscle-specific mitochondrial and vesicular transport genes as methylmercury toxicity targets in a Drosophila model of congenital Minamata disease.

Toxicol Sci. 2025 May 1;205(1):106-123. doi: 10.1093/toxsci/kfaf018.

Endothelial SHANK3 regulates tight junctions in the neonatal mouse blood-brain barrier through β-Catenin signaling.

Nat Commun. 2025 Feb 6;16(1):1407. doi: 10.1038/s41467-025-56720-1.

Development of compounds for targeted degradation of mammalian cryptochrome proteins.

Philos Trans R Soc Lond B Biol Sci. 2025 Jan 23;380(1918):20230342. doi: 10.1098/rstb.2023.0342.

本文引用的文献

Stubb: a program for discovery and analysis of cis-regulatory modules.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W555-9. doi: 10.1093/nar/gkl224.

Adaptively inferring human transcriptional subnetworks.

Mol Syst Biol. 2006;2:2006.0029. doi: 10.1038/msb4100067. Epub 2006 Jun 6.

Practical strategies for discovering regulatory DNA sequence motifs.

PLoS Comput Biol. 2006 Apr;2(4):e36. doi: 10.1371/journal.pcbi.0020036.

Control of developmental regulators by Polycomb in human embryonic stem cells.

Cell. 2006 Apr 21;125(2):301-13. doi: 10.1016/j.cell.2006.02.043.

Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions.

Genes Dev. 2006 May 1;20(9):1123-36. doi: 10.1101/gad.381706. Epub 2006 Apr 17.

An improved map of conserved regulatory sites for Saccharomyces cerevisiae.

BMC Bioinformatics. 2006 Mar 7;7:113. doi: 10.1186/1471-2105-7-113.

CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure.

PLoS Genet. 2006 Mar;2(3):e26. doi: 10.1371/journal.pgen.0020026. Epub 2006 Mar 3.

Evidence for an instructive mechanism of de novo methylation in cancer cells.

Nat Genet. 2006 Feb;38(2):149-53. doi: 10.1038/ng1719.

Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity.

Cell. 2006 Jan 13;124(1):47-59. doi: 10.1016/j.cell.2005.10.042.

The Polycomb group protein EZH2 directly controls DNA methylation.

Nature. 2006 Feb 16;439(7078):871-4. doi: 10.1038/nature04431. Epub 2005 Dec 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在DNA序列排名列表中发现基序。

Discovering motifs in ranked lists of DNA sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献