Roth Michael J, Forbes Andrew J, Boyne Michael T, Kim Yong-Bin, Robinson Dana E, Kelleher Neil L
Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA.
Mol Cell Proteomics. 2005 Jul;4(7):1002-8. doi: 10.1074/mcp.M500064-MCP200. Epub 2005 Apr 28.
The human proteome is a highly complex extension of the genome wherein a single gene often produces distinct protein forms due to alternative splicing, RNA editing, polymorphisms, and posttranslational modifications. Such biological variation compounded by the high sequence identity within gene families currently overwhelms the complete and routine characterization of mammalian proteins by MS. A new data base of human proteins (and their possible variants) was created and searched using tandem mass spectrometric data from intact proteins. This first application of top down MS/MS to wild-type human proteins demonstrates both gene-specific identification and the unambiguous characterization of multifaceted mass shifts (Deltam values). Such Deltam values found from the precise identification of 45 protein forms from HeLa cells reveal 34 coding single nucleotide polymorphisms, two protein forms from alternative splicing, and 12 diverse modifications (not including simple N-terminal processing), including a previously unknown phosphorylation at 10% occupancy. Automated protein identification was achieved with a median expectation value of 10(-13) and often occurred simultaneously with dissection of diverse sources of protein variability as they occur in combination. Top down MS therefore has a bright future for enabling precise annotation of gene products expressed from the human genome by non-mass spectrometrists.
人类蛋白质组是基因组的高度复杂扩展,其中单个基因常常由于可变剪接、RNA编辑、多态性和翻译后修饰而产生不同的蛋白质形式。基因家族内高度的序列同一性加剧了这种生物学变异,目前通过质谱法对哺乳动物蛋白质进行完整且常规的表征面临巨大挑战。我们创建了一个人类蛋白质(及其可能变体)的新数据库,并使用完整蛋白质的串联质谱数据进行搜索。这种自上而下的串联质谱法首次应用于野生型人类蛋白质,既展示了基因特异性鉴定,又明确表征了多方面的质量位移(Δm值)。从对来自HeLa细胞的45种蛋白质形式的精确鉴定中发现的此类Δm值,揭示了34个编码单核苷酸多态性、两种可变剪接产生的蛋白质形式以及12种不同修饰(不包括简单的N端加工),包括一种占有率为10%的此前未知的磷酸化修饰。实现了自动化蛋白质鉴定,中位期望值为10^(-13),并且在剖析多种蛋白质变异来源同时出现的情况时常常同时发生。因此,自上而下的质谱法在使非质谱专家能够精确注释人类基因组表达的基因产物方面具有光明的前景。