Suppr超能文献

MATLIGN:一种基序聚类、比较和匹配工具。

MATLIGN: a motif clustering, comparison and matching tool.

作者信息

Kankainen Matti, Löytynoja Ari

机构信息

Institute of Biotechnology, University of Helsinki, Helsinki, Finland.

出版信息

BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.

Abstract

BACKGROUND

Sequence motifs representing transcription factor binding sites (TFBS) are commonly encoded as position frequency matrices (PFM) or degenerate consensus sequences (CS). These formats are used to represent the characterised TFBS profiles stored in transcription factor databases, as well as to represent the potential motifs predicted using computational methods. To fill the gap between the known and predicted motifs, methods are needed for the post-processing of prediction results, i.e. for matching, comparison and clustering of pre-selected motifs. The computational identification of over-represented motifs in sets of DNA sequences is, in particular, a task where post-processing can dramatically simplify the analysis. Efficient post-processing, for example, reduces the redundancy of the motifs predicted and enables them to be annotated.

RESULTS

In order to facilitate the post-processing of motifs, in both PFM and CS formats, we have developed a tool called Matlign. The tool aligns and evaluates the similarity of motifs using a combination of scoring functions, and visualises the results using hierarchical clustering. By limiting the number of distinct gaps created (though, not their length), the alignment algorithm also correctly aligns motifs with an internal spacer. The method selects the best non-redundant motif set, with repetitive motifs merged together, by cutting the hierarchical tree using silhouette values. Our analyses show that Matlign can reliably discover the most similar analogue from a collection of characterised regulatory elements such that the method is also useful for the annotation of motif predictions by PFM library searches.

CONCLUSION

Matlign is a user-friendly tool for post-processing large collections of DNA sequence motifs. Starting from a large number of potential regulatory motifs, Matlign provides a researcher with a non-redundant set of motifs, which can then be further associated to known regulatory elements. A web-server is available at http://ekhidna.biocenter.helsinki.fi/poxo/matlign.

摘要

背景

代表转录因子结合位点(TFBS)的序列基序通常被编码为位置频率矩阵(PFM)或简并共有序列(CS)。这些格式用于表示存储在转录因子数据库中的已表征TFBS图谱,以及表示使用计算方法预测的潜在基序。为了填补已知基序和预测基序之间的差距,需要对预测结果进行后处理的方法,即对预选基序进行匹配、比较和聚类。特别是,在DNA序列集中计算识别过度代表的基序是一项后处理可以显著简化分析的任务。例如,高效的后处理可以减少预测基序的冗余并使其能够被注释。

结果

为了便于对PFM和CS格式的基序进行后处理,我们开发了一个名为Matlign的工具。该工具使用评分函数组合来比对和评估基序的相似性,并使用层次聚类来可视化结果。通过限制创建的不同间隙的数量(尽管不限制其长度),比对算法还能正确比对带有内部间隔区的基序。该方法通过使用轮廓值切割层次树来选择最佳的非冗余基序集,将重复的基序合并在一起。我们的分析表明,Matlign可以从一组已表征的调控元件中可靠地发现最相似的类似物,因此该方法也可用于通过PFM库搜索对基序预测进行注释。

结论

Matlign是一个用于后处理大量DNA序列基序的用户友好型工具。从大量潜在的调控基序开始,Matlign为研究人员提供了一组非冗余的基序,然后可以将其进一步与已知的调控元件相关联。可通过http://ekhidna.biocenter.helsinki.fi/poxo/matlign访问网络服务器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9663/1925120/926758c66bb2/1471-2105-8-189-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验