Suppr超能文献

真核生物和细菌蛋白质组中无序模式和同型重复序列的出现情况。

Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes.

作者信息

Lobanov M Yu, Galzitskaya O V

机构信息

Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.

出版信息

Mol Biosyst. 2012 Jan;8(1):327-37. doi: 10.1039/c1mb05318c. Epub 2011 Oct 18.

Abstract

Combining the motif discovery and disorder protein segment identification in PDB allows us to create the first and largest library of disordered patterns. At present the library includes 109 disordered patterns. Here we offer a comprehensive analysis of the occurrence of selected disordered patterns and 20 homorepeats of 6 residues long in 123 proteomes. 27 disordered patterns occur sparsely in all considered proteomes, but the patterns of low-complexity-homorepeats-appear more often in eukaryotic than in bacterial proteomes. A comparative analysis of the number of proteins containing homorepeats of 6 residues long and the disordered selected patterns in these proteomes has been performed. The matrices of correlation coefficients between numbers of proteins where at least once a homorepeat of six residues long for each of 20 types of amino acid residues and 109 disordered patterns from the library appears in 9 kingdoms of eukaryota and 5 phyla of bacteria have been calculated. As a rule, the correlation coefficients are higher inside the considered kingdom than between them. The largest fraction of homorepeats of 6 residues belongs to Amoebozoa proteomes (D. discoideum), 46%. Moreover, the longest uninterrupted repeats belong to S306 from D. discoideum (Amoebozoa). Homorepeats of some amino acids occur more frequently than others and the type of homorepeats varies across different proteomes, . For example, E6 appears most frequent for all considered proteomes for Chordata, Q6 for Arthropoda, S6 for Nematoda. The averaged occurrence of multiple long runs of 6 amino acids in a decreasing order for 97 eukaryotic proteomes is as follows: Q6, S6, A6, G6, N6, E6, P6, T6, D6, K6, L6, H6, R6, F6, V6, I6, Y6, C6, M6, W6, and for 26 bacterial proteomes it is A6, G6, P6, and the others occur seldom. This suggests that such short similar motifs are responsible for common functions for nonhomologous, unrelated proteins from different organisms.

摘要

结合蛋白质数据库(PDB)中的基序发现和无序蛋白质片段识别,使我们能够创建首个也是最大的无序模式库。目前该库包含109种无序模式。在此,我们对123个蛋白质组中选定的无序模式和20个6残基长的同聚物重复序列的出现情况进行了全面分析。27种无序模式在所有考虑的蛋白质组中出现频率较低,但低复杂性同聚物重复序列模式在真核生物蛋白质组中比在细菌蛋白质组中更常见。对这些蛋白质组中包含6残基长同聚物重复序列的蛋白质数量与选定的无序模式进行了比较分析。计算了20种氨基酸残基中每种至少出现一次6残基长同聚物重复序列的蛋白质数量与来自9个真核生物界和5个细菌门的库中109种无序模式之间的相关系数矩阵。通常,所考虑的界内的相关系数高于它们之间的相关系数。6残基长同聚物重复序列中最大比例属于变形虫门蛋白质组(盘基网柄菌),为46%。此外,最长的不间断重复序列属于盘基网柄菌(变形虫门)的S306。某些氨基酸的同聚物重复序列比其他的出现更频繁,并且同聚物重复序列的类型在不同蛋白质组中有所不同。例如,对于所有考虑的脊索动物蛋白质组,E6出现频率最高,对于节肢动物为Q6,对于线虫为S6。97个真核生物蛋白质组中多个6氨基酸长连续序列的平均出现频率从高到低如下:Q6、S6、A6、G6、N6、E6、P6、T6、D6、K6、L6、H6、R6、F6、V6、I6、Y6、C6、M6、W6,对于26个细菌蛋白质组则是A6、G6、P6,其他的很少出现。这表明这种短的相似基序负责不同生物体中非同源、不相关蛋白质的共同功能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验