MATLIGN：一种基序聚类、比较和匹配工具。

MATLIGN: a motif clustering, comparison and matching tool.

作者信息

Kankainen Matti, Löytynoja Ari

机构信息

Institute of Biotechnology, University of Helsinki, Helsinki, Finland.

出版信息

BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.

DOI:10.1186/1471-2105-8-189

PMID:17559640

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1925120/

Abstract

BACKGROUND

Sequence motifs representing transcription factor binding sites (TFBS) are commonly encoded as position frequency matrices (PFM) or degenerate consensus sequences (CS). These formats are used to represent the characterised TFBS profiles stored in transcription factor databases, as well as to represent the potential motifs predicted using computational methods. To fill the gap between the known and predicted motifs, methods are needed for the post-processing of prediction results, i.e. for matching, comparison and clustering of pre-selected motifs. The computational identification of over-represented motifs in sets of DNA sequences is, in particular, a task where post-processing can dramatically simplify the analysis. Efficient post-processing, for example, reduces the redundancy of the motifs predicted and enables them to be annotated.

RESULTS

In order to facilitate the post-processing of motifs, in both PFM and CS formats, we have developed a tool called Matlign. The tool aligns and evaluates the similarity of motifs using a combination of scoring functions, and visualises the results using hierarchical clustering. By limiting the number of distinct gaps created (though, not their length), the alignment algorithm also correctly aligns motifs with an internal spacer. The method selects the best non-redundant motif set, with repetitive motifs merged together, by cutting the hierarchical tree using silhouette values. Our analyses show that Matlign can reliably discover the most similar analogue from a collection of characterised regulatory elements such that the method is also useful for the annotation of motif predictions by PFM library searches.

CONCLUSION

Matlign is a user-friendly tool for post-processing large collections of DNA sequence motifs. Starting from a large number of potential regulatory motifs, Matlign provides a researcher with a non-redundant set of motifs, which can then be further associated to known regulatory elements. A web-server is available at http://ekhidna.biocenter.helsinki.fi/poxo/matlign.

摘要

背景

代表转录因子结合位点（TFBS）的序列基序通常被编码为位置频率矩阵（PFM）或简并共有序列（CS）。这些格式用于表示存储在转录因子数据库中的已表征TFBS图谱，以及表示使用计算方法预测的潜在基序。为了填补已知基序和预测基序之间的差距，需要对预测结果进行后处理的方法，即对预选基序进行匹配、比较和聚类。特别是，在DNA序列集中计算识别过度代表的基序是一项后处理可以显著简化分析的任务。例如，高效的后处理可以减少预测基序的冗余并使其能够被注释。

结果

为了便于对PFM和CS格式的基序进行后处理，我们开发了一个名为Matlign的工具。该工具使用评分函数组合来比对和评估基序的相似性，并使用层次聚类来可视化结果。通过限制创建的不同间隙的数量（尽管不限制其长度），比对算法还能正确比对带有内部间隔区的基序。该方法通过使用轮廓值切割层次树来选择最佳的非冗余基序集，将重复的基序合并在一起。我们的分析表明，Matlign可以从一组已表征的调控元件中可靠地发现最相似的类似物，因此该方法也可用于通过PFM库搜索对基序预测进行注释。

结论

Matlign是一个用于后处理大量DNA序列基序的用户友好型工具。从大量潜在的调控基序开始，Matlign为研究人员提供了一组非冗余的基序，然后可以将其进一步与已知的调控元件相关联。可通过http://ekhidna.biocenter.helsinki.fi/poxo/matlign访问网络服务器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9663/1925120/926758c66bb2/1471-2105-8-189-1.jpg

相似文献

MATLIGN: a motif clustering, comparison and matching tool.

BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.

STAMP: a web tool for exploring DNA-binding motif similarities.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8. doi: 10.1093/nar/gkm272. Epub 2007 May 3.

Improved benchmarks for computational motif discovery.

BMC Bioinformatics. 2007 Jun 8;8:193. doi: 10.1186/1471-2105-8-193.

SCOPE: a web server for practical de novo motif discovery.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W259-64. doi: 10.1093/nar/gkm310. Epub 2007 May 7.

ClusterDraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors.

Bioinformatics. 2007 Apr 15;23(8):1032-4. doi: 10.1093/bioinformatics/btm047. Epub 2007 Feb 18.

SPACER: identification of cis-regulatory elements with non-contiguous critical residues.

Bioinformatics. 2007 Apr 15;23(8):1029-31. doi: 10.1093/bioinformatics/btm041.

DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies.

PLoS Comput Biol. 2007 Mar 30;3(3):e61. doi: 10.1371/journal.pcbi.0030061. Epub 2007 Feb 15.

A novel Bayesian DNA motif comparison method for clustering and retrieval.

PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

Bounded search for de novo identification of degenerate cis-regulatory elements.

BMC Bioinformatics. 2006 May 15;7:254. doi: 10.1186/1471-2105-7-254.

cWINNOWER algorithm for finding fuzzy DNA motifs.

Proc IEEE Comput Soc Bioinform Conf. 2003;2:260-5.

引用本文的文献

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.

BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.

Integrated analysis of motif activity and gene expression changes of transcription factors.

Genome Res. 2018 Feb;28(2):243-255. doi: 10.1101/gr.227231.117. Epub 2017 Dec 12.

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences.

BMC Genomics. 2014 Oct 23;15(1):925. doi: 10.1186/1471-2164-15-925.

Jaccard index based similarity measure to compare transcription factor binding site models.

Algorithms Mol Biol. 2013 Sep 30;8(1):23. doi: 10.1186/1748-7188-8-23.

A discriminative approach for unsupervised clustering of DNA sequence motifs.

PLoS Comput Biol. 2013;9(3):e1002958. doi: 10.1371/journal.pcbi.1002958. Epub 2013 Mar 21.

Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

BMC Plant Biol. 2013 Mar 15;13:42. doi: 10.1186/1471-2229-13-42.

Motif discovery and transcription factor binding sites before and after the next-generation sequencing era.

Brief Bioinform. 2013 Mar;14(2):225-37. doi: 10.1093/bib/bbs016. Epub 2012 Apr 19.

A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis.

Biol Direct. 2011 Jun 13;6:30. doi: 10.1186/1745-6150-6-30.

A computational approach for genome-wide mapping of splicing factor binding sites.

Genome Biol. 2009;10(3):R30. doi: 10.1186/gb-2009-10-3-r30. Epub 2009 Mar 18.

本文引用的文献

DNA motifs in human and mouse proximal promoters predict tissue-specific expression.

Proc Natl Acad Sci U S A. 2006 Apr 18;103(16):6275-80. doi: 10.1073/pnas.0508169103. Epub 2006 Apr 10.

What are DNA sequence motifs?

Nat Biotechnol. 2006 Apr;24(4):423-5. doi: 10.1038/nbt0406-423.

Discovery of regulatory elements in vertebrates through comparative genomics.

Nat Biotechnol. 2005 Oct;23(10):1249-56. doi: 10.1038/nbt1140.

T-Reg Comparator: an analysis tool for the comparison of position weight matrices.

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W438-41. doi: 10.1093/nar/gki590.

POCO: discovery of regulatory patterns from promoters of oppositely expressed gene sets.

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W427-31. doi: 10.1093/nar/gki467.

Bioinformatics. 2005 Feb 1;21(3):307-13. doi: 10.1093/bioinformatics/bth480. Epub 2004 Aug 19.

WebLogo: a sequence logo generator.

Genome Res. 2004 Jun;14(6):1188-90. doi: 10.1101/gr.849004.

Detection of functional DNA motifs via statistical over-representation.

Nucleic Acids Res. 2004 Feb 26;32(4):1372-81. doi: 10.1093/nar/gkh299. Print 2004.

JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4. doi: 10.1093/nar/gkh012.

Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes.

Funct Integr Genomics. 2003 Jul;3(3):125-34. doi: 10.1007/s10142-003-0086-6. Epub 2003 Jun 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MATLIGN：一种基序聚类、比较和匹配工具。

MATLIGN: a motif clustering, comparison and matching tool.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献