Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan.
Graduate School of Science and Technology, Niigata University , Niigata 950-2181, Japan.
J Proteome Res. 2017 Dec 1;16(12):4403-4414. doi: 10.1021/acs.jproteome.7b00423. Epub 2017 Oct 31.
In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10)% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.
为了完成人类蛋白质组计划(HPP),染色体中心人类蛋白质组计划(C-HPP)于 2012 年启动了缺失蛋白质(MP)调查之旅。然而,在 neXtProt(2017-1)中仍有 2579 个和 572 个蛋白质条目分别被认为是缺失和不确定的蛋白质。因此,在这项研究中,我们提出了一种分析、鉴定和验证开放获取转录组学和蛋白质组学数据库中人类缺失和不确定蛋白质的方法。在人类蛋白质图谱中对缺失蛋白质的 RNA 表达模式进行分析表明,其中 28%(如 O60431 嗅觉受体 1I1)没有 RNA 表达,这表明在转录组学和蛋白质组学研究中需要考虑不常见的组织。有趣的是,21%的蛋白质在特定组织中表达水平升高(组织丰富蛋白质),这表明在其升高的组织中靶向这些蛋白质的重要性。此外,对缺失蛋白质的 RNA 表达水平进行分析表明,95%的蛋白质没有或低表达水平(0-10 个转录物/每百万个转录物),这表明低丰度是检测缺失蛋白质的主要障碍之一。此外,缺失蛋白质预测产生的预测独特胰蛋白酶肽比已鉴定的蛋白质少。在开放获取基于 MS 的数据库(PA、GPM)的实验肽列表中搜索与缺失和不确定蛋白质相对应的这些预测独特的胰蛋白酶肽,导致检测到至少 2 个独特肽(≥9 个氨基酸)的 402 个缺失和 19 个不确定蛋白质,其错误发现率(FDR)<(5 × 10)%。最后,通过在三个转换源(QQQ、QTOF、QTRAP)上将实验检测到的肽的原始谱与 SRMAtlas 合成的对应物进行匹配,我们有机会通过≥2 个肽型肽来验证 41 个缺失蛋白。