Wolf Yuri I, Carmel Liran, Koonin Eugene V
National Institutes of Health, National Center for Biotechnology Information, National Library of Medicine Bethesda, MD 20894, USA.
Proc Biol Sci. 2006 Jun 22;273(1593):1507-15. doi: 10.1098/rspb.2006.3472.
Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level (EL), connectivity of genetic and protein-protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate (ER) and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g. products of highly expressed genes also have a propensity to be involved in many protein-protein interactions, whereas variables between classes are negatively correlated, e.g. highly expressed genes, on average, evolve slower than weakly expressed genes. Here, we describe principal component (PC) analysis of seven genome-related variables and propose biological interpretations for the first three PCs. The first PC reflects a gene's 'importance', or the 'status' of a gene in the genomic community, with positive contributions from knockout lethality, EL, number of protein-protein interaction partners and the number of paralogues, and negative contributions from sequence ER and gene loss propensity. The next two PCs define a plane that seems to reflect the functional and evolutionary plasticity of a gene. Specifically, PC2 can be interpreted as a gene's 'adaptability' whereby genes with high adaptability readily duplicate, have many genetic interaction partners and tend to be non-essential. PC3 also might reflect the role of a gene in organismal adaptation albeit with a negative rather than a positive contribution of genetic interactions; we provisionally designate this PC 'reactivity'. The interpretation of PC2 and PC3 as measures of a gene's plasticity is compatible with the observation that genes with high values of these PCs tend to be expressed in a condition- or tissue-specific manner. Functional classes of genes substantially vary in status, adaptability and reactivity, with the highest status characteristic of the translation system and cytoskeletal proteins, highest adaptability seen in cellular processes and signalling genes, and top reactivity characteristic of metabolic enzymes.
近期的基因组分析揭示了表征基因功能的变量(如表达水平(EL)、遗传和蛋白质-蛋白质相互作用网络的连通性以及基因敲除效应)与描述基因进化的变量(如序列进化速率(ER)和基因丢失倾向)之间存在有趣的相关性。通常,这些类别中的每个类别内的变量呈正相关,例如高表达基因的产物也倾向于参与许多蛋白质-蛋白质相互作用,而不同类别之间的变量呈负相关,例如平均而言,高表达基因的进化速度比低表达基因慢。在此,我们描述了七个与基因组相关变量的主成分(PC)分析,并对前三个主成分提出了生物学解释。第一个主成分反映了基因的“重要性”,或基因在基因组群落中的“地位”,基因敲除致死率、表达水平、蛋白质-蛋白质相互作用伙伴数量和旁系同源物数量对其有正向贡献,而序列进化速率和基因丢失倾向对其有负向贡献。接下来的两个主成分定义了一个平面,似乎反映了基因的功能和进化可塑性。具体而言,主成分2可以解释为基因的“适应性”,具有高适应性的基因容易复制,有许多遗传相互作用伙伴,并且往往是非必需的。主成分3可能也反映了基因在生物体适应中的作用,尽管遗传相互作用的贡献是负向而非正向;我们暂时将这个主成分称为“反应性”。将主成分2和主成分3解释为基因可塑性的度量与以下观察结果一致,即这些主成分值高的基因倾向于以条件或组织特异性方式表达。基因的功能类别在地位、适应性和反应性方面有很大差异,翻译系统和细胞骨架蛋白具有最高的地位特征,细胞过程和信号基因具有最高的适应性,代谢酶具有最高的反应性特征。