一种用于聚类和检索的新型贝叶斯DNA基序比较方法。

A novel Bayesian DNA motif comparison method for clustering and retrieval.

作者信息

Habib Naomi, Kaplan Tommy, Margalit Hanah, Friedman Nir

机构信息

School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel.

出版信息

PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

DOI:10.1371/journal.pcbi.1000010

PMID:18463706

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2265534/

Abstract

Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.

摘要

表征转录因子的DNA结合特异性是计算生物学中的一个关键问题，已有多种算法对其进行了研究。这些算法通常将假定由同一因子结合的序列作为输入，并输出一个或多个DNA基序。一种常见的做法是同时应用几种这样的算法，以冗余为代价提高覆盖率。在解释这些结果时，有两项任务至关重要：对冗余基序进行聚类，以及通过从先前表征的基序库中检索相似基序，将这些基序归因于转录因子。这两项任务本质上都涉及基序比较。在此，我们提出一种基于贝叶斯概率原理的基序比较与合并新方法。该方法既考虑了两个基序在位置核苷酸分布上的相似性，也考虑了它们与背景分布的差异。我们展示了将这种新的比较方法用作基序聚类和检索程序的基础，并将其与几种常用的替代方法进行比较。我们的结果表明，新方法在准确性和灵敏度方面优于其他现有方法。我们将所得的基序聚类和检索程序整合到一个用于分析DNA基序的大规模自动化流程中。该流程整合了各种DNA基序发现算法的结果，并自动将来自多个训练集的冗余基序合并到一个连贯的带注释基序库中。将此流程应用于酿酒酵母最近的全基因组转录因子定位数据，成功地识别出了DNA基序，其效果与文献中报道的半自动分析相当。此外，我们展示了这种分析如何阐明转录因子条件特异性偏好的机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f424/2265534/adadb86a7d0d/pcbi.1000010.g001.jpg

相似文献

A novel Bayesian DNA motif comparison method for clustering and retrieval.

PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

A novel ensemble learning method for de novo computational identification of DNA binding sites.

BMC Bioinformatics. 2007 Jul 12;8:249. doi: 10.1186/1471-2105-8-249.

Metamotifs--a generative model for building families of nucleotide position weight matrices.

BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348.

MATLIGN: a motif clustering, comparison and matching tool.

BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.

Sequence features of DNA binding sites reveal structural class of associated transcription factor.

Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2.

A discriminative approach for unsupervised clustering of DNA sequence motifs.

PLoS Comput Biol. 2013;9(3):e1002958. doi: 10.1371/journal.pcbi.1002958. Epub 2013 Mar 21.

Discriminative discovery of transcription factor binding sites from location data.

Proc IEEE Comput Syst Bioinform Conf. 2005:86-9. doi: 10.1109/csb.2005.30.

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.

Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2.

SCOPE: a web server for practical de novo motif discovery.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W259-64. doi: 10.1093/nar/gkm310. Epub 2007 May 7.

引用本文的文献

Strain-Specific Enhances the Efficacy of Cancer Therapeutics in Tumor-Bearing Mice.

Cancers (Basel). 2021 Feb 25;13(5):957. doi: 10.3390/cancers13050957.

PRMT5-dependent transcriptional repression of c-Myc target genes promotes gastric cancer progression.

Theranostics. 2020 Mar 15;10(10):4437-4452. doi: 10.7150/thno.42047. eCollection 2020.

ZBTB7A Mediates the Transcriptional Repression Activity of the Androgen Receptor in Prostate Cancer.

Cancer Res. 2019 Oct 15;79(20):5260-5271. doi: 10.1158/0008-5472.CAN-19-0815. Epub 2019 Aug 23.

Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.

PLoS Comput Biol. 2017 Aug 23;13(8):e1005725. doi: 10.1371/journal.pcbi.1005725. eCollection 2017 Aug.

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.

Evolutionary Conservation and Diversification of Puf RNA Binding Proteins and Their mRNA Targets.

PLoS Biol. 2015 Nov 20;13(11):e1002307. doi: 10.1371/journal.pbio.1002307. eCollection 2015.

Target analysis by integration of transcriptome and ChIP-seq data with BETA.

Nat Protoc. 2013 Dec;8(12):2502-15. doi: 10.1038/nprot.2013.150. Epub 2013 Nov 21.

De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins.

Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

Jaccard index based similarity measure to compare transcription factor binding site models.

Algorithms Mol Biol. 2013 Sep 30;8(1):23. doi: 10.1186/1748-7188-8-23.

Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors.

Nucleic Acids Res. 2013 Oct;41(19):8822-41. doi: 10.1093/nar/gkt578. Epub 2013 Aug 2.

本文引用的文献

STAMP: a web tool for exploring DNA-binding motif similarities.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8. doi: 10.1093/nar/gkm272. Epub 2007 May 3.

DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies.

PLoS Comput Biol. 2007 Mar 30;3(3):e61. doi: 10.1371/journal.pcbi.0030061. Epub 2007 Feb 15.

Quantifying similarity between motifs.

Genome Biol. 2007;8(2):R24. doi: 10.1186/gb-2007-8-2-r24.

A modelling approach to quantify dynamic crosstalk between the pheromone and the starvation pathway in baker's yeast.

FEBS J. 2006 Aug;273(15):3520-33. doi: 10.1111/j.1742-4658.2006.05359.x.

Regulation of mating and filamentation genes by two distinct Ste12 complexes in Saccharomyces cerevisiae.

Mol Cell Biol. 2006 Jul;26(13):4794-805. doi: 10.1128/MCB.02053-05.

Practical strategies for discovering regulatory DNA sequence motifs.

PLoS Comput Biol. 2006 Apr;2(4):e36. doi: 10.1371/journal.pcbi.0020036.

An improved map of conserved regulatory sites for Saccharomyces cerevisiae.

BMC Bioinformatics. 2006 Mar 7;7:113. doi: 10.1186/1471-2105-7-113.

Pheromone-regulated sumoylation of transcription factors that mediate the invasive to mating developmental switch in yeast.

J Biol Chem. 2006 Jan 27;281(4):1964-9. doi: 10.1074/jbc.M508985200. Epub 2005 Nov 23.

Protein-DNA binding specificity predictions with structural models.

Nucleic Acids Res. 2005 Oct 24;33(18):5781-98. doi: 10.1093/nar/gki875. Print 2005.

Ab initio prediction of transcription factor targets using structural knowledge.

PLoS Comput Biol. 2005 Jun;1(1):e1. doi: 10.1371/journal.pcbi.0010001. Epub 2005 Jun 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于聚类和检索的新型贝叶斯DNA基序比较方法。

A novel Bayesian DNA motif comparison method for clustering and retrieval.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献