Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China.
J Proteome Res. 2024 Jul 5;23(7):2323-2331. doi: 10.1021/acs.jproteome.3c00674. Epub 2024 Jun 12.
The Chromosome-Centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome. Currently, the human proteome still contains approximately 2000 PE2-PE5 proteins, referring to annotated coding genes that lack sufficient protein-level evidence. During the past 10 years, it has been increasingly difficult to identify PE2-PE5 proteins in C-HPP approaches due to the limited occurrence. Therefore, we proposed that reanalyzing massive MS data sets in repository with newly developed algorithms may increase the occurrence of the peptides of these proteins. In this study, we downloaded 1000 MS data sets via the ProteomeXchange database. Using pFind software, we identified peptides referring to 1788 PE2-PE5 proteins. Among them, 11 PE2 and 16 PE5 proteins were identified with at least 2 peptides, and 12 of them were identified using 2 peptides in a single data set, following the criteria of the HPP guidelines. We found translation evidence for 16 of the 11 PE2 and 16 PE5 proteins in our RNC-seq data, supporting their existence. The properties of the PE2 and PE5 proteins were similar to those of the PE1 proteins. Our approach demonstrated that mining PE2 and PE5 proteins in massive data repository is still worthy, and multidata set peptide identifications may support the presence of PE2 and PE5 proteins or at least prompt additional studies for validation. Extremely high throughput could be a solution to finding more PE2 and PE5 proteins.
染色体中心人类蛋白质组计划(C-HPP)旨在鉴定人类基因组编码的所有蛋白质。目前,人类蛋白质组中仍约有 2000 种 PE2-PE5 蛋白质,这些蛋白质指的是注释编码基因,但缺乏足够的蛋白质水平证据。在过去的 10 年中,由于出现频率有限,在 C-HPP 方法中鉴定 PE2-PE5 蛋白质变得越来越困难。因此,我们提出重新分析存储库中大量的 MS 数据集,使用新开发的算法可以增加这些蛋白质的肽段出现频率。在这项研究中,我们通过 ProteomeXchange 数据库下载了 1000 个 MS 数据集。使用 pFind 软件,我们鉴定了 1788 种 PE2-PE5 蛋白质的肽段。其中,有 11 种 PE2 和 16 种 PE5 蛋白质至少被鉴定到 2 个肽段,其中 12 种蛋白质在单个数据集中被鉴定到 2 个肽段,符合 HPP 指南的标准。在我们的 RNC-seq 数据中,我们发现了 16 种 PE2 和 16 种 PE5 蛋白质中有 16 种具有翻译证据,支持了它们的存在。PE2 和 PE5 蛋白质的性质与 PE1 蛋白质相似。我们的方法表明,在大量数据存储库中挖掘 PE2 和 PE5 蛋白质仍然是值得的,多数据集肽段鉴定可以支持 PE2 和 PE5 蛋白质的存在,或者至少提示进行额外的验证研究。极高的通量可能是发现更多 PE2 和 PE5 蛋白质的解决方案。