Nacher J C, Hayashida M, Akutsu T
Department of Complex and Intelligent Systems, Future University-Hakodate, Hakodate, Hokkaido, Japan.
Biosystems. 2010 Aug;101(2):127-35. doi: 10.1016/j.biosystems.2010.05.005.
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.
许多蛋白质由多个结构域组成。这些多结构域蛋白质可能是在进化过程中通过选择性基因组生长动态产生的,以执行新功能并创建在生物学上可行的时间尺度上折叠的结构。结构域单元经常通过多种基因重排机制进化。在这里,我们研究了包括真核生物、古细菌和细菌物种在内的1000多种生物的蛋白质结构域统计数据。该分析将早期关于蛋白质组不对称统计规律的发现扩展到了更广泛的物种。虽然蛋白质由各种各样的结构域组成,呈现幂律衰减,但对每个蛋白质的结构域家族的计算揭示了一种指数分布,表征了由少量独特家族组成的蛋白质世界。蛋白质组学的结构研究表明,结构域重复或内部重复结构域占基因组的比例虽小但很显著。尽管其很重要,但这一观察结果直到最近才被广泛忽视。我们对蛋白质组的进化动态进行建模,并证明这些不同的分布实际上源于一种内部重复机制。这个过程产生了当代蛋白质结构域世界,决定了其缩小的厚度,并控制了其生长。这些发现具有重要意义,从蛋白质相互作用网络建模到基于控制基因组扩展的基本机制的进化研究。