Lange Philipp F, Huesgen Pitter F, Nguyen Karen, Overall Christopher M
Centre for Blood Research, University of British Columbia , 2350 Health Sciences Mall, Vancouver, British Columbia V6T 1Z3, Canada.
J Proteome Res. 2014 Apr 4;13(4):2028-44. doi: 10.1021/pr401191w. Epub 2014 Mar 10.
A goal of the Chromosome-centric Human Proteome Project is to identify all human protein species. With 3844 proteins annotated as "missing", this is challenging. Moreover, proteolytic processing generates new protein species with characteristic neo-N termini that are frequently accompanied by altered half-lives, function, interactions, and location. Enucleated and largely void of internal membranes and organelles, erythrocytes are simple yet proteomically challenging cells due to the high hemoglobin content and wide dynamic range of protein concentrations that impedes protein identification. Using the N-terminomics procedure TAILS, we identified 1369 human erythrocyte natural and neo-N-termini and 1234 proteins. Multiple semitryptic N-terminal peptides exhibited improved mass spectrometric identification properties versus the intact tryptic peptide enabling identification of 281 novel erythrocyte proteins and six missing proteins identified for the first time in the human proteome. With an improved bioinformatics workflow, we developed a new classification system and the Terminus Cluster Score. Thereby we described a new stabilizing N-end rule for processed protein termini, which discriminates novel protein species from degradation remnants, and identified protein domain hot spots susceptible to cleavage. Strikingly, 68% of the N-termini were within genome-encoded protein sequences, revealing alternative translation initiation sites, pervasive endoproteolytic processing, and stabilization of protein fragments in vivo. The mass spectrometry proteomics data have been deposited to ProteomeXchange with the data set identifier
以染色体为中心的人类蛋白质组计划的一个目标是鉴定所有人类蛋白质种类。目前有3844种蛋白质被标注为“缺失”,这极具挑战性。此外,蛋白水解加工会产生具有特征性新N端的新蛋白质种类,这些新N端通常伴随着半衰期、功能、相互作用及定位的改变。红细胞去核且基本没有内膜和细胞器,由于血红蛋白含量高以及蛋白质浓度动态范围广,这阻碍了蛋白质鉴定,所以红细胞是简单但蛋白质组学研究颇具挑战的细胞。我们使用N端蛋白质组学方法TAILS,鉴定出1369个人类红细胞天然N端和新N端以及1234种蛋白质。与完整的胰蛋白酶肽段相比,多个半胰蛋白酶N端肽段展现出更好的质谱鉴定特性,从而鉴定出281种新型红细胞蛋白质以及在人类蛋白质组中首次鉴定出的6种缺失蛋白质。通过改进的生物信息学工作流程,我们开发了一种新的分类系统和末端聚类评分。由此我们描述了一种针对加工后蛋白质末端的新的稳定N端规则,该规则可区分新蛋白质种类与降解残余物,并鉴定出易受切割的蛋白质结构域热点。引人注目的是,68%的N端位于基因组编码的蛋白质序列内,揭示了替代翻译起始位点、普遍存在的内切蛋白水解加工以及体内蛋白质片段的稳定化。质谱蛋白质组学数据已以数据集标识符