基于蒙特卡罗的框架增强了调控序列基序的发现和解释。

A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.

机构信息

Department of Biomedical Engineering, One Shields Ave, University of California, Davis, CA 95616, USA.

出版信息

BMC Bioinformatics. 2012 Nov 27;13:317. doi: 10.1186/1471-2105-13-317.

DOI:10.1186/1471-2105-13-317

PMID:23181585

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3542263/

Abstract

BACKGROUND

Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.

RESULTS

We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.

CONCLUSIONS

Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.

摘要

背景

在一组序列中发现功能上重要的短且统计上过度表示的子序列模式（基序）是生物信息学中的一个具有挑战性的问题。通常情况下，集合中的并非所有序列都包含基序。这些不包含基序的序列使基序的算法发现变得复杂。因此，从较大的序列集合中筛选不包含基序的序列，同时确定基序的身份是 motif 发现研究中的一个理想且非平凡的问题。

结果

我们描述了 MotifCatcher，它是一个通过随机抽样来有效去除 motif 搜索中非包含基序序列的框架，从而扩展了现有 motif 查找工具的灵敏度。我们开发了两种算法实现，它们分别围绕常用的 motif 查找工具构建，并将我们的算法应用于三个不同的染色质免疫沉淀（ChIP）数据集。在每种情况下，带有 MotifCatcher 扩展的 motif 查找器都比单独的 motif 查找器表现出更高的灵敏度。我们的方法将候选功能显著的已发现基序组织成一棵树，这使我们能够做出更多的见解。在所有情况下，我们都能够用文献中的实验工作来支持我们的发现。

结论

我们的框架表明，在序列输入级别进行额外的处理可以显著提高现有 motif 查找工具的性能。对于每个测试的生物数据集，我们都能够提出新的生物学假设，并得到文献中实验工作的支持。具体来说，在大肠杆菌中，我们提出了 6 个非传统 LexA 蛋白结合位点的结合位点基序；在酿酒酵母中，我们假设了 Cse4p 蛋白新结合位点的 2 种不同机制；在 Halobacterium sp. NRC-1 中，我们在几个数据集发现了一般转录因子（GTF）结合位点基序的细微差异。我们认为，我们发现的基序中的细微差异可能为一个或多个同源 GTF 蛋白提供特异性。我们在 http://www.bme.ucdavis.edu/facciotti/resources_data/software/ 上提供了 MotifCatcher 软件包的免费实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ab5/3542263/ddb71cf17993/1471-2105-13-317-1.jpg

相似文献

A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.基于蒙特卡罗的框架增强了调控序列基序的发现和解释。

BMC Bioinformatics. 2012 Nov 27;13:317. doi: 10.1186/1471-2105-13-317.

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs：一种整合了系统发育的吉布斯采样基序查找器。

PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.

An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments.一种用于寻找蛋白质-DNA结合位点的算法及其在染色质免疫沉淀微阵列实验中的应用。

Nat Biotechnol. 2002 Aug;20(8):835-9. doi: 10.1038/nbt717. Epub 2002 Jul 8.

A transdimensional Bayesian model for pattern recognition in DNA sequences.一种用于DNA序列模式识别的跨维度贝叶斯模型。

Biostatistics. 2008 Oct;9(4):668-85. doi: 10.1093/biostatistics/kxm058. Epub 2008 Mar 18.

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm.使用DEME算法在DNA和蛋白质序列中发现鉴别性基序。

BMC Bioinformatics. 2007 Oct 15;8:385. doi: 10.1186/1471-2105-8-385.

NestedMICA as an ab initio protein motif discovery tool.NestedMICA作为一种从头开始的蛋白质基序发现工具。

BMC Bioinformatics. 2008 Jan 14;9:19. doi: 10.1186/1471-2105-9-19.

Localized motif discovery in gene regulatory sequences.基因调控序列中的局部模体发现。

Bioinformatics. 2010 May 1;26(9):1152-9. doi: 10.1093/bioinformatics/btq106. Epub 2010 Mar 11.

A study on the application of topic models to motif finding algorithms.主题模型在基序查找算法中的应用研究。

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):502. doi: 10.1186/s12859-016-1364-3.

Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification.用于转录因子结合位点识别的模糊聚类与期望最大化的顺序集成

J Comput Biol. 2018 Nov;25(11):1247-1256. doi: 10.1089/cmb.2017.0230. Epub 2018 Aug 22.

Discovering motifs in ranked lists of DNA sequences.在DNA序列排名列表中发现基序。

PLoS Comput Biol. 2007 Mar 23;3(3):e39. doi: 10.1371/journal.pcbi.0030039.

引用本文的文献

The Exploration of Novel Regulatory Relationships Drives Haloarchaeal Operon-Like Structural Dynamics over Short Evolutionary Distances.新型调控关系的探索推动了嗜盐古菌在短进化距离上类操纵子结构的动态变化。

Microorganisms. 2020 Nov 30;8(12):1900. doi: 10.3390/microorganisms8121900.

The Primary Antisense Transcriptome of NRC-1.NRC-1 的主要反义转录组。

Genes (Basel). 2019 Apr 5;10(4):280. doi: 10.3390/genes10040280.

Genotoxic, Metabolic, and Oxidative Stresses Regulate the RNA Repair Operon of Salmonella enterica Serovar Typhimurium.基因毒性、代谢和氧化应激调节鼠伤寒沙门氏菌的 RNA 修复操纵子。

J Bacteriol. 2018 Nov 6;200(23). doi: 10.1128/JB.00476-18. Print 2018 Dec 1.

Internal RNAs overlapping coding sequences can drive the production of alternative proteins in archaea.内部 RNA 与编码序列重叠可驱动古菌中产生替代蛋白。

RNA Biol. 2018;15(8):1119-1132. doi: 10.1080/15476286.2018.1509661. Epub 2018 Sep 19.

Structural and functional adaptation of Haloferax volcanii TFEα/β.耐辐射奇球菌 TF Eα/β 的结构和功能适应。

Nucleic Acids Res. 2018 Mar 16;46(5):2308-2320. doi: 10.1093/nar/gkx1302.

Same same but different: The evolution of TBP in archaea and their eukaryotic offspring.相似却又不同：古菌及其真核后代中TBP的进化

Transcription. 2017 May 27;8(3):162-168. doi: 10.1080/21541264.2017.1289879. Epub 2017 Feb 8.

A global analysis of transcription reveals two modes of Spt4/5 recruitment to archaeal RNA polymerase.一项全球范围内的转录分析揭示了 Spt4/5 招募到古菌 RNA 聚合酶的两种模式。

Nat Microbiol. 2017 Mar 1;2:17021. doi: 10.1038/nmicrobiol.2017.21.

Development of New Modular Genetic Tools for Engineering the Halophilic Archaeon Halobacterium salinarum.用于工程改造嗜盐古菌盐沼盐杆菌的新型模块化遗传工具的开发

PLoS One. 2015 Jun 10;10(6):e0129215. doi: 10.1371/journal.pone.0129215. eCollection 2015.

A regulatory hierarchy controls the dynamic transcriptional response to extreme oxidative stress in archaea.一个调控层级控制古菌对极端氧化应激的动态转录反应。

PLoS Genet. 2015 Jan 8;11(1):e1004912. doi: 10.1371/journal.pgen.1004912. eCollection 2015 Jan.

Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.利用基于蛋白质结构的预测推断非模式生物中扩展的类Lrp进食/饥饿转录因子靶标

PLoS One. 2014 Sep 25;9(9):e107863. doi: 10.1371/journal.pone.0107863. eCollection 2014.

本文引用的文献

A workflow for genome-wide mapping of archaeal transcription factors with ChIP-seq.利用 ChIP-seq 进行全基因组范围内的古菌转录因子作图的工作流程。

Nucleic Acids Res. 2012 May;40(10):e74. doi: 10.1093/nar/gks063. Epub 2012 Feb 9.

Deep and wide digging for binding motifs in ChIP-Seq data.深度和广泛挖掘 ChIP-Seq 数据中的结合基序。

Bioinformatics. 2010 Oct 15;26(20):2622-3. doi: 10.1093/bioinformatics/btq488. Epub 2010 Aug 24.

Metamotifs--a generative model for building families of nucleotide position weight matrices.Metamotifs--一种构建核苷酸位置权重矩阵家族的生成模型。

BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348.

RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach.RegPredict：一种通过比较基因组学方法进行原核生物调控子推断的集成系统。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W299-307. doi: 10.1093/nar/gkq531. Epub 2010 Jun 11.

On the detection and refinement of transcription factor binding sites using ChIP-Seq data.利用 ChIP-Seq 数据检测和改进转录因子结合位点。

Nucleic Acids Res. 2010 Apr;38(7):2154-67. doi: 10.1093/nar/gkp1180. Epub 2010 Jan 6.

Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing.使用多重短读长DNA测序进行高效酵母染色质免疫沉淀测序（ChIP-Seq）

BMC Genomics. 2009 Jan 21;10:37. doi: 10.1186/1471-2164-10-37.

Design and analysis of ChIP-seq experiments for DNA-binding proteins.用于DNA结合蛋白的ChIP-seq实验的设计与分析。

Nat Biotechnol. 2008 Dec;26(12):1351-9. doi: 10.1038/nbt.1508. Epub 2008 Nov 16.

RSAT: regulatory sequence analysis tools.RSAT：调控序列分析工具。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W119-27. doi: 10.1093/nar/gkn304. Epub 2008 May 21.

A novel Bayesian DNA motif comparison method for clustering and retrieval.一种用于聚类和检索的新型贝叶斯DNA基序比较方法。

PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

A survey of DNA motif finding algorithms.DNA基序查找算法综述。

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于蒙特卡罗的框架增强了调控序列基序的发现和解释。

A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献