Suppr超能文献

使用自同源点图发现和分析蛋白质中的重复和低复杂度结构及其保守的进化关系。

Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots.

机构信息

Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Warsaw, Poland.

i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal.

出版信息

Methods Mol Biol. 2025;2870:95-116. doi: 10.1007/978-1-0716-4213-9_7.

Abstract

Proteins that contain sequence repetitions and low complexity regions can be analyzed using self-homology dot plot analysis. Dot plots can readily identify protein sequence repeats; the number of repeats and their length and location within the protein sequence are readily identifiable from the dot plots without the need to pre-define any of these attributes, making this method largely model-independent. We discuss the criteria for statistical identification of protein repeats and recommend simple ways of identifying protein repeats. While higher levels of sequence conservation within the repeats do make them easier to formally identify, this method can identify protein repeats with fairly low levels of conservation, as well as notably non-tandem repetitions with sizeable sections of complex, non-repeat sequence separating the individual repeat instances. Furthermore, even simple visual examination of these dot plots can discover conserved patterns within families of closely related proteins, and the level of this conservation can be readily quantified using a Jaccard index. Exhaustive pairwise comparisons can be assembled using hierarchical clustering methods to get a picture of the conserved repeat architectures within families of repeat proteins.

摘要

含有序列重复和低复杂度区域的蛋白质可以使用自同源点图分析进行分析。点图可以很容易地识别蛋白质序列重复;重复的数量及其在蛋白质序列中的长度和位置可以从点图中轻松识别,而无需预先定义这些属性中的任何一个,这使得该方法在很大程度上不受模型的限制。我们讨论了蛋白质重复的统计识别标准,并推荐了识别蛋白质重复的简单方法。虽然重复序列内更高的序列保守性确实使它们更容易正式识别,但该方法可以识别保守性相当低的蛋白质重复,以及明显非串联重复,其中单独的重复实例之间有相当大的复杂非重复序列部分。此外,即使是对这些点图的简单目视检查也可以发现密切相关蛋白质家族内的保守模式,并且可以使用 Jaccard 指数轻松量化这种保守程度。可以使用层次聚类方法来组装详尽的成对比较,以了解重复蛋白家族内的保守重复结构。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验