Lane Lydie, Bairoch Amos, Beavis Ronald C, Deutsch Eric W, Gaudet Pascale, Lundberg Emma, Omenn Gilbert S
SIB-Swiss Institute of Bioinformatics , CMU - Rue Michel-Servet 1, 1211 Geneva, Switzerland.
J Proteome Res. 2014 Jan 3;13(1):15-20. doi: 10.1021/pr401144x. Epub 2013 Dec 23.
One year ago the Human Proteome Project (HPP) leadership designated the baseline metrics for the Human Proteome Project to be based on neXtProt with a total of 13,664 proteins validated at protein evidence level 1 (PE1) by mass spectrometry, antibody-capture, Edman sequencing, or 3D structures. Corresponding chromosome-specific data were provided from PeptideAtlas, GPMdb, and Human Protein Atlas. This year, the neXtProt total is 15,646 and the other resources, which are inputs to neXtProt, have high-quality identifications and additional annotations for 14,012 in PeptideAtlas, 14,869 in GPMdb, and 10,976 in HPA. We propose to remove 638 genes from the denominator that are "uncertain" or "dubious" in Ensembl, UniProt/SwissProt, and neXtProt. That leaves 3844 "missing proteins", currently having no or inadequate documentation, to be found from a new denominator of 19,490 protein-coding genes. We present those tabulations and web links and discuss current strategies to find the missing proteins.
一年前,人类蛋白质组计划(HPP)领导层指定人类蛋白质组计划的基线指标以neXtProt为基础,共有13,664种蛋白质通过质谱、抗体捕获、埃德曼测序或三维结构在蛋白质证据水平1(PE1)得到验证。来自PeptideAtlas、GPMdb和人类蛋白质图谱提供了相应的染色体特异性数据。今年,neXtProt中的总数为15,646种,而作为neXtProt输入数据的其他资源,在PeptideAtlas中有14,012种、在GPMdb中有14,869种以及在人类蛋白质图谱中有10,976种具有高质量鉴定结果和额外注释。我们提议从分母中去除Ensembl、UniProt/SwissProt和neXtProt中“不确定”或“可疑”的638个基因。这样就剩下3844种“缺失蛋白质”,目前没有或仅有不充分的记录,需要从19,490个蛋白质编码基因的新分母中去寻找。我们展示了这些表格和网页链接,并讨论了寻找缺失蛋白质的当前策略。