Caufield J Harry, Abreu Marco, Wimble Christopher, Uetz Peter
Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America.
PLoS Comput Biol. 2015 Feb 27;11(2):e1004107. doi: 10.1371/journal.pcbi.1004107. eCollection 2015 Feb.
Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 "gold standard" protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 "gold standard" protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial "model" species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies.
最近已获得对大肠杆菌和肺炎支原体蛋白质复合物的大规模分析结果,分别产生了443个和116个异源多聚体可溶性蛋白质复合物。我们将这些经质谱表征的蛋白质复合物的结果与EcoCyc鉴定的285个“金标准”蛋白质复合物相结合。与基因直系同源、保守性和必需性数据库的比较确定了在其他物种的复合物中保守或缺失的蛋白质。例如,在大肠杆菌的285个“金标准”蛋白质复合物中,在一组7个远缘相关细菌“模式”物种中,完全保守的不到10%。复合物的保守性遵循三种模式之一:高度保守的复合物、具有保守核心的复合物以及具有部分保守但无保守核心的复合物。将比较扩展到894个不同的细菌基因组,说明了蛋白质复合物各组分之间的部分保守性和共保守性的局限性:在285个模式蛋白质复合物中,只有14个在95%的所用基因组中完全保守,但我们预测至少有180个可能在至少一半的基因组中部分保守。未观察到基因必需性与蛋白质复合物保守性之间的明确关系,因为即使是保守性较差的复合物也包含大量必需蛋白质。最后,我们鉴定出183个含有高度保守组分和未表征蛋白质的复合物,它们将是未来实验研究的有趣目标。