评估用于预测转录因子结合位点的系统发育基序模型。

Assessing phylogenetic motif models for predicting transcription factor binding sites.

作者信息

Hawkins John, Grant Charles, Noble William Stafford, Bailey Timothy L

机构信息

Institute for Molecular Bioscience, University of Queensland, Qld, Australia.

出版信息

Bioinformatics. 2009 Jun 15;25(12):i339-47. doi: 10.1093/bioinformatics/btp201.

DOI:10.1093/bioinformatics/btp201

PMID:19478008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2687955/

Abstract

MOTIVATION

A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning.

RESULTS

We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled 'random' motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing 'weak' sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs.

摘要

动机

已经开发出多种算法，通过利用相关物种基因组多序列比对中隐含的进化信息来预测基因组中的转录因子结合位点（TFBS）。其中一种方法使用标准位置特异性基序模型的扩展，该模型通过系统发育树和进化模型纳入系统发育信息。然而，这些系统发育基序模型（PMM）从未经过严格的基准测试，以确定它们是否比使用简单位置权重矩阵扫描能更好地预测TFBS。

结果

我们评估了三种基于PMM的预测算法，每种算法对空位比对的处理方式不同，并将它们的预测准确性与非系统发育基序扫描方法的准确性进行比较。令人惊讶的是，当使用经过验证的酵母TFBS的金标准来衡量准确性时，所有这些算法似乎都不如简单的基序扫描。然而，当我们放弃金标准并考虑预测的具有统计学意义的位点数量时，PMM扫描器的表现比简单的基序扫描要好得多，使用列重排的“随机”基序来衡量显著性。这些结果表明，使用已知位点集合来衡量结合位点预测器准确性的常见做法可能会产生危险的误导，因为这样的集合可能缺少“弱”位点，而这些位点恰恰是区分预测器所需的位点类型。然后，我们扩展了之前基于PMM的预测算法统计能力的理论模型，以考虑进化过程中结合位点的丢失，并表明它给出了扫描器准确性更准确的上限。最后，利用我们的理论模型，我们引入了一种预测基因组中真实结合位点数量的新方法。结果表明，酵母TF的真实位点数量通常比酿酒酵母数据库（SCPD）中列出的已知位点数量大几倍。在我们测试的三种扫描算法中，MONKEY算法在预测酵母TFBS方面具有最高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2291/2687955/2037873806a5/btp201f1.jpg

相似文献

Assessing phylogenetic motif models for predicting transcription factor binding sites.

Bioinformatics. 2009 Jun 15;25(12):i339-47. doi: 10.1093/bioinformatics/btp201.

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.

PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.

Integrating genomic data to predict transcription factor binding.

Genome Inform. 2005;16(1):83-94.

Incorporating evolution of transcription factor binding sites into annotated alignments.

J Biosci. 2007 Aug;32(5):841-50. doi: 10.1007/s12038-007-0084-2.

A Bayesian search for transcriptional motifs.

PLoS One. 2010 Nov 18;5(11):e13897. doi: 10.1371/journal.pone.0013897.

Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.

BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.

Properly defining the targets of a transcription factor significantly improves the computational identification of cooperative transcription factor pairs in yeast.

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S10. doi: 10.1186/1471-2164-16-S12-S10. Epub 2015 Dec 9.

Variable structure motifs for transcription factor binding sites.

BMC Genomics. 2010 Jan 14;11:30. doi: 10.1186/1471-2164-11-30.

SPIC: a novel similarity metric for comparing transcription factor binding site motifs based on information contents.

BMC Syst Biol. 2013;7 Suppl 2(Suppl 2):S14. doi: 10.1186/1752-0509-7-S2-S14. Epub 2013 Dec 17.

Identifying functional transcription factor binding sites in yeast by considering their positional preference in the promoters.

PLoS One. 2013 Dec 26;8(12):e83791. doi: 10.1371/journal.pone.0083791. eCollection 2013.

引用本文的文献

Unrealistic phylogenetic trees may improve phylogenetic footprinting.

Bioinformatics. 2017 Jun 1;33(11):1639-1646. doi: 10.1093/bioinformatics/btx033.

Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information.

BMC Genomics. 2016 May 10;17:347. doi: 10.1186/s12864-016-2682-6.

Functional analysis of transcription factor binding sites in human promoters.

Genome Biol. 2012 Sep 26;13(9):R50. doi: 10.1186/gb-2012-13-9-r50.

Identification of molecular compartments and genetic circuitry in the developing mammalian kidney.

Development. 2012 May;139(10):1863-73. doi: 10.1242/dev.074005.

Tissue-specific prediction of directly regulated genes.

Bioinformatics. 2011 Sep 1;27(17):2354-60. doi: 10.1093/bioinformatics/btr399. Epub 2011 Jun 30.

A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

PLoS One. 2011 Apr 14;6(4):e18430. doi: 10.1371/journal.pone.0018430.

Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.

PLoS Comput Biol. 2010 Dec 16;6(12):e1001037. doi: 10.1371/journal.pcbi.1001037.

Theoretical and empirical quality assessment of transcription factor-binding motifs.

Nucleic Acids Res. 2011 Feb;39(3):808-24. doi: 10.1093/nar/gkq710. Epub 2010 Oct 4.

NFIA controls telencephalic progenitor cell differentiation through repression of the Notch effector Hes1.

J Neurosci. 2010 Jul 7;30(27):9127-39. doi: 10.1523/JNEUROSCI.6167-09.2010.

Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data.

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W268-74. doi: 10.1093/nar/gkq330. Epub 2010 Apr 30.

本文引用的文献

Analysis of combinatorial cis-regulation in synthetic and genomic promoters.

Nature. 2009 Jan 8;457(7226):215-8. doi: 10.1038/nature07521. Epub 2008 Nov 23.

High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites.

Nucleic Acids Res. 2009 Jan;37(1):14-25. doi: 10.1093/nar/gkn866. Epub 2008 Nov 6.

The evolution of combinatorial gene regulation in fungi.

PLoS Biol. 2008 Feb;6(2):e38. doi: 10.1371/journal.pbio.0060038.

A nucleosome-guided map of transcription factor binding sites in yeast.

PLoS Comput Biol. 2007 Nov;3(11):e215. doi: 10.1371/journal.pcbi.0030215. Epub 2007 Sep 24.

Reliable prediction of regulator targets using 12 Drosophila genomes.

Genome Res. 2007 Dec;17(12):1919-31. doi: 10.1101/gr.7090407. Epub 2007 Nov 7.

Divergence of transcription factor binding sites across related yeast species.

Science. 2007 Aug 10;317(5839):815-9. doi: 10.1126/science.1140748.

Frequent gain and loss of functional transcription factor binding sites.

PLoS Comput Biol. 2007 May;3(5):e99. doi: 10.1371/journal.pcbi.0030099. Epub 2007 Apr 19.

Chromatin modifications and their function.

Cell. 2007 Feb 23;128(4):693-705. doi: 10.1016/j.cell.2007.02.005.

Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection.

Genome Res. 2006 Dec;16(12):1517-28. doi: 10.1101/gr.5655606. Epub 2006 Oct 19.

Large-scale turnover of functional transcription factor binding sites in Drosophila.

PLoS Comput Biol. 2006 Oct;2(10):e130. doi: 10.1371/journal.pcbi.0020130. Epub 2006 Aug 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估用于预测转录因子结合位点的系统发育基序模型。

Assessing phylogenetic motif models for predicting transcription factor binding sites.

作者信息

Hawkins John, Grant Charles, Noble William Stafford, Bailey Timothy L

机构信息

Institute for Molecular Bioscience, University of Queensland, Qld, Australia.

出版信息

Bioinformatics. 2009 Jun 15;25(12):i339-47. doi: 10.1093/bioinformatics/btp201.

DOI:10.1093/bioinformatics/btp201

PMID:19478008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2687955/

Abstract

MOTIVATION

RESULTS

摘要

评估用于预测转录因子结合位点的系统发育基序模型。

Assessing phylogenetic motif models for predicting transcription factor binding sites.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估用于预测转录因子结合位点的系统发育基序模型。

Assessing phylogenetic motif models for predicting transcription factor binding sites.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献