Institute of Enzymology, Research Center For Natural Sciences, Hungarian Academy of Sciences, Karolina út 29, Budapest, Hungary.
Genome Biol. 2011 Dec 19;12(12):R120. doi: 10.1186/gb-2011-12-12-r120.
Sequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species.
We revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues.
We conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.
对最初几个真核生物基因组进行测序后,人们产生了一种印象,即基因数量与生物体的复杂性之间没有相关性,这通常被称为“G 值悖论”。此前,已经有几项尝试试图解决这一悖论,其中包括蛋白质的多功能性、选择性剪接、microRNAs 或非编码 DNA。由于内在蛋白质无序性与对环境刺激的复杂反应以及细胞间的通讯有关,因此另一种可能性是,结构无序性可能有效地增加物种的复杂性。
我们通过分析许多新的蛋白质组,重新研究了 G 值悖论,这些蛋白质组的复杂性是通过其独特的细胞类型数量来衡量的。我们发现,复杂性和蛋白质组大小与总氨基酸数量显著相关,呈幂函数关系。我们系统地分析了几个生物体和组织中与复杂性相关的许多其他特征,发现:蛋白质结构无序的分数在原核生物和真核生物之间显著增加,但在进化过程中不会进一步增加;蛋白质组中无序区域中预测的结合位点数量随复杂性的增加而增加;蛋白质无序的分数、预测的结合位点、选择性剪接和蛋白质-蛋白质相互作用的分数都随着人类组织的复杂性而增加。
我们得出结论,复杂性是一种多参数特征,由相互作用潜力、选择性剪接能力、组织特异性蛋白质无序性以及最重要的蛋白质组大小决定。只有当植物与后生动物一起分组时,G 值悖论才会出现,因为它们的复杂性与蛋白质组大小之间存在不同的关系。