Suppr超能文献

基于 n-gram 和频谱重排的序列分析可视化框架。

A visual framework for sequence analysis using n-grams and spectral rearrangement.

机构信息

Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Bioinformatics. 2010 Mar 15;26(6):737-44. doi: 10.1093/bioinformatics/btq042. Epub 2010 Feb 3.

Abstract

MOTIVATION

Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.

RESULTS

Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis.

AVAILABILITY

A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/

CONTACT

m.ragan@uq.edu.au

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质序列通常由由于结构域改组、重组或基因转换而具有不同进化历史的区域组成。需要新的方法来发现、可视化和分析这些序列区域,从而更好地理解蛋白质进化。

结果

在这里,我们开发了一种无对齐和可视化的方法来分析序列关系。我们使用序列之间共享的 n 元组数量作为序列相似性的度量,并应用谱技术重新排列得到的亲和矩阵。亲和矩阵的热图用于识别和可视化相关序列或异常值的聚类,而基于 n 元组的点图和保守性图允许对选定序列之间的相似性进行详细分析。使用这种方法,我们已经在一个特征较差的家族中识别出了结构域改组的特征,并且在另一个家族中识别出了同源聚类。我们得出的结论是,这种方法可能是一种有用的框架,用于分析相关但高度不同的蛋白质序列。它特别适用于在进行更耗时的多序列比对和系统发育分析之前,快速研究序列关系。

可用性

此处描述的框架的软件实现(MOSAIC)可从 http://bioinformatics.org.au/mosaic/ 下载。

联系方式

m.ragan@uq.edu.au

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验