Burkard Thomas R, Planyavsky Melanie, Kaupe Ines, Breitwieser Florian P, Bürckstümmer Tilmann, Bennett Keiryn L, Superti-Furga Giulio, Colinge Jacques
CeMM - Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 19/3, A-1090 Vienna, Austria.
BMC Syst Biol. 2011 Jan 26;5:17. doi: 10.1186/1752-0509-5-17.
On the basis of large proteomics datasets measured from seven human cell lines we consider their intersection as an approximation of the human central proteome, which is the set of proteins ubiquitously expressed in all human cells. Composition and properties of the central proteome are investigated through bioinformatics analyses.
We experimentally identify a central proteome comprising 1,124 proteins that are ubiquitously and abundantly expressed in human cells using state of the art mass spectrometry and protein identification bioinformatics. The main represented functions are proteostasis, primary metabolism and proliferation. We further characterize the central proteome considering gene structures, conservation, interaction networks, pathways, drug targets, and coordination of biological processes. Among other new findings, we show that the central proteome is encoded by exon-rich genes, indicating an increased regulatory flexibility through alternative splicing to adapt to multiple environments, and that the protein interaction network linking the central proteome is very efficient for synchronizing translation with other biological processes. Surprisingly, at least 10% of the central proteome has no or very limited functional annotation.
Our data and analysis provide a new and deeper description of the human central proteome compared to previous results thereby extending and complementing our knowledge of commonly expressed human proteins. All the data are made publicly available to help other researchers who, for instance, need to compare or link focused datasets to a common background.
基于从七种人类细胞系测量得到的大型蛋白质组数据集,我们将它们的交集视为人类核心蛋白质组的近似值,人类核心蛋白质组是指在所有人类细胞中普遍表达的蛋白质集合。通过生物信息学分析对核心蛋白质组的组成和特性进行了研究。
我们使用最先进的质谱技术和蛋白质鉴定生物信息学方法,通过实验鉴定出一个由1124种蛋白质组成的核心蛋白质组,这些蛋白质在人类细胞中普遍且大量表达。其主要代表功能是蛋白质稳态、初级代谢和增殖。我们进一步从基因结构、保守性、相互作用网络、信号通路、药物靶点以及生物过程的协调性等方面对核心蛋白质组进行了表征。在其他新发现中,我们表明核心蛋白质组由富含外显子的基因编码,这表明通过可变剪接增加了调节灵活性,以适应多种环境,并且连接核心蛋白质组的蛋白质相互作用网络在使翻译与其他生物过程同步方面非常有效。令人惊讶的是,至少10%的核心蛋白质组没有或只有非常有限的功能注释。
与之前的结果相比,我们的数据和分析对人类核心蛋白质组提供了新的、更深入的描述,从而扩展和补充了我们对人类常见表达蛋白质的认识。所有数据均已公开,以帮助其他研究人员,例如那些需要将重点数据集与共同背景进行比较或关联的研究人员。