Zheng Clarissa, Andken Marie, Mudge Jonathan M, Magrane Michele, Orchard Sandra, Sun Zhi, Deutsch Eric W
Institute for Systems Biology, Seattle, Washington 98109, United States.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, U.K.
J Proteome Res. 2025 Jul 4;24(7):3507-3533. doi: 10.1021/acs.jproteome.5c00167. Epub 2025 Jun 12.
One aim of the international Human Proteome Organization (HUPO) Human Proteome Project (HPP) is to obtain high-confidence translation evidence for every human protein-coding gene established in its target list of 19,433 entries based on the protein-coding genes from Ensembl-GENCODE. However, 76 are annotated in UniProtKB (as of release 2024_06) with PE5, indicating skepticism in the protein's existence from a manual curator, so it is unclear if these entries belong in the HPP target list. Here, we review these 76 entries by assembling evidence from the literature, reference databases, and genome alignments with other species to conclude whether these entries should be freed from their PE5 status to become annotated with PE1-4 in UniProtKB. We find that 17 of these have credible translation evidence and therefore should be upgraded to PE1. Another 15 lack translation evidence but have transcription evidence, the evolutionary hallmarks of protein-coding genes, and are presumed to produce functional proteins. 41 have no translational or transcriptional evidence, although they still bear the evolutionary hallmarks of protein-coding genes; currently, it remains unclear if these are protein-coding, so their representation becomes a matter of policy. Only 3 entries still seem best categorized as PE5 and excluded from the HUPO-HPP target list.
国际人类蛋白质组组织(HUPO)人类蛋白质组计划(HPP)的一个目标是,基于Ensembl-GENCODE的蛋白质编码基因,为其19433个条目的目标列表中确立的每个人类蛋白质编码基因获取高可信度的翻译证据。然而,在UniProtKB(截至2024_06版本)中有76个被注释为PE5,这表明人工编目员对该蛋白质的存在表示怀疑,所以尚不清楚这些条目是否属于HPP目标列表。在此,我们通过汇集文献、参考数据库中的证据以及与其他物种的基因组比对结果,对这76个条目进行审查,以确定这些条目是否应摆脱其PE5状态,从而在UniProtKB中被注释为PE1 - 4。我们发现其中17个有可靠的翻译证据,因此应升级为PE1。另外15个缺乏翻译证据,但有转录证据,而转录证据是蛋白质编码基因的进化标志,据此推测它们会产生功能性蛋白质。41个既没有翻译证据也没有转录证据,尽管它们仍然具有蛋白质编码基因的进化标志;目前,尚不清楚这些是否为蛋白质编码基因,所以它们的呈现方式就成了一个政策问题。只有3个条目似乎仍最好归类为PE5,并被排除在HUPO - HPP目标列表之外。