准确识别原核基因组中具有正确长度的顺式调控基序。

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes.

机构信息

Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, GA 30602, USA.

出版信息

Nucleic Acids Res. 2010 Jan;38(2):e12. doi: 10.1093/nar/gkp907. Epub 2009 Nov 11.

DOI:10.1093/nar/gkp907

PMID:19906734

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2811016/

Abstract

We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/.

摘要

我们提出了一种新的计算方法，用于解决一个经典问题，即在给定的启动子序列集中识别顺式调控基序，这是基于一个关键的新思想。与所有现有的基序发现程序不同，我们的方法不是单独对候选基序进行评分，而是使用 P 值对具有相似序列的候选基序组（称为基序闭包）进行评分，这大大提高了预测的可靠性。我们的新 P 值评分方案与序列长度无关，因此可以在相同的基础上直接比较具有不同长度的预测基序。我们已经将这种方法实现为一个基序识别计算机（MREC）程序，并在原核基因组的模拟和生物数据上对 MREC 进行了广泛的测试。我们的测试结果表明，在我们的测试集中的绝大多数情况下，MREC 可以准确地挑选出实际的基序，并将正确长度的基序作为得分最高的候选基序。我们将我们的预测结果与两个基序发现程序 Cosmo 和 MEME 进行了比较，发现 MREC 在所有测试案例中都明显优于这两个程序。MREC 程序可在 http://csbl.bmb.uga.edu/~bingqiang/MREC1/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b8/2811016/b8d0d0c0f8f1/gkp907f1.jpg

相似文献

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes.准确识别原核基因组中具有正确长度的顺式调控基序。

Nucleic Acids Res. 2010 Jan;38(2):e12. doi: 10.1093/nar/gkp907. Epub 2009 Nov 11.

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.一种用于原核生物基因组中顺式调控基序识别的综合且适用的系统发育足迹分析框架。

BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.

An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale.一个基因组范围内精确预测和分析顺式调控基序的综合工具包。

Bioinformatics. 2013 Sep 15;29(18):2261-8. doi: 10.1093/bioinformatics/btt397. Epub 2013 Jul 10.

A new framework for identifying cis-regulatory motifs in prokaryotes.一种用于鉴定原核生物顺式调控基序的新框架。

Nucleic Acids Res. 2011 Apr;39(7):e42. doi: 10.1093/nar/gkq948. Epub 2010 Dec 11.

DMINDA: an integrated web server for DNA motif identification and analyses.DMINDA：一个用于 DNA 基序识别和分析的集成网络服务器。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W12-9. doi: 10.1093/nar/gku315. Epub 2014 Apr 21.

A fast weak motif-finding algorithm based on community detection in graphs.基于图中社区检测的快速弱模式发现算法。

BMC Bioinformatics. 2013 Jul 17;14:227. doi: 10.1186/1471-2105-14-227.

DOOR: a database for prokaryotic operons.DOOR：一个原核生物操纵子数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D459-63. doi: 10.1093/nar/gkn757. Epub 2008 Nov 6.

Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.原核生物顺式调控结合位点的全基因组从头预测。

Nucleic Acids Res. 2009 Jun;37(10):e72. doi: 10.1093/nar/gkp248. Epub 2009 Apr 21.

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。

BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.

Discovering cis-regulatory RNAs in Shewanella genomes by Support Vector Machines.利用支持向量机在希瓦氏菌基因组中发现顺式调控RNA

PLoS Comput Biol. 2009 Apr;5(4):e1000338. doi: 10.1371/journal.pcbi.1000338. Epub 2009 Apr 3.

引用本文的文献

Integrating genome sequence and structural data for statistical learning to predict transcription factor binding sites.整合基因组序列和结构数据用于统计学习以预测转录因子结合位点。

Nucleic Acids Res. 2020 Dec 16;48(22):12604-12617. doi: 10.1093/nar/gkaa1134.

BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.

Phylogenetic footprinting: a boost for microbial regulatory genomics.系统发育足迹分析：促进微生物调控基因组学研究。

Protoplasma. 2012 Oct;249(4):901-7. doi: 10.1007/s00709-011-0351-9. Epub 2011 Nov 24.

A new framework for identifying cis-regulatory motifs in prokaryotes.一种用于鉴定原核生物顺式调控基序的新框架。

Nucleic Acids Res. 2011 Apr;39(7):e42. doi: 10.1093/nar/gkq948. Epub 2010 Dec 11.

本文引用的文献

Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.原核生物顺式调控结合位点的全基因组从头预测。

Nucleic Acids Res. 2009 Jun;37(10):e72. doi: 10.1093/nar/gkp248. Epub 2009 Apr 21.

Efficient representation and P-value computation for high-order Markov motifs.高阶马尔可夫基序的高效表示与P值计算

Bioinformatics. 2008 Aug 15;24(16):i160-6. doi: 10.1093/bioinformatics/btn282.

W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data.W-AlignACE：一种基于从序列以及基因表达/芯片数据中学习到的更精确位置权重矩阵的改进型吉布斯采样算法。

Bioinformatics. 2008 May 1;24(9):1121-8. doi: 10.1093/bioinformatics/btn088. Epub 2008 Mar 5.

Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach.使用蛋白质-DNA对接方法基于结构预测转录因子结合位点。

Proteins. 2008 Sep;72(4):1114-24. doi: 10.1002/prot.22002.

RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation.RegulonDB（版本6.0）：大肠杆菌K-12超越转录的基因调控模型、活跃（实验性）注释启动子及Textpresso导航

Nucleic Acids Res. 2008 Jan;36(Database issue):D120-4. doi: 10.1093/nar/gkm994. Epub 2007 Dec 23.

Supervised detection of conserved motifs in DNA sequences with cosmo.使用cosmo对DNA序列中保守基序进行监督检测。

Stat Appl Genet Mol Biol. 2007;6:Article8. doi: 10.2202/1544-6115.1260. Epub 2007 Feb 23.

Computing exact P-values for DNA motifs.计算DNA基序的精确P值。

Bioinformatics. 2007 Mar 1;23(5):531-7. doi: 10.1093/bioinformatics/btl662. Epub 2007 Jan 18.

Computational identification of transcriptional regulatory elements in DNA sequence.DNA序列中转录调控元件的计算识别

Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. doi: 10.1093/nar/gkl372. Print 2006.

Computing the P-value of the information content from an alignment of multiple sequences.根据多条序列比对结果计算信息含量的P值。

Bioinformatics. 2005 Jun;21 Suppl 1:i311-8. doi: 10.1093/bioinformatics/bti1044.

Assessing computational tools for the discovery of transcription factor binding sites.评估用于发现转录因子结合位点的计算工具。

Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

准确识别原核基因组中具有正确长度的顺式调控基序。

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献