J Proteome Res. 2018 Dec 7;17(12):4227-4234. doi: 10.1021/acs.jproteome.8b00496. Epub 2018 Oct 15.
High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% of all proteins predicted to result in translated gene products in the human genome. In fact, the galloping rate of data acquisition and sharing of mass spectrometry data has led to the current availability of many tens of terabytes of public data in thousands of human data sets. The systematic reanalysis of these public data sets has been used to build a community-scale spectral library of 2.1 million precursors for over 1 million unique sequences from over 19,000 proteins (including spectra of synthetic peptides). However, it has remained challenging to find and inspect spectra of peptides covering functional protein regions or matching novel proteins. ProteinExplorer addresses these challenges with an intuitive interface mapping tens of millions of identifications to functional sites on nearly all human proteins while maintaining provenance for every identification back to the original data set and data file. Additionally, ProteinExplorer facilitates the selection and inspection of HPP-compliant peptides whose spectra can be matched to spectra of synthetic peptides and already includes HPP-compliant evidence for 107 missing (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows users to rate spectra and to contribute to a community library of peptides entitled PrEdict (Protein Existance dictionary) mapping to novel proteins but whose preliminary identities have not yet been fully established with community-scale false discovery rates and synthetic peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp .
高通量串联质谱技术已经能够检测和鉴定人类基因组中超过 75%的所有预测翻译产物基因的蛋白质。事实上,质谱数据的获取和共享速度正在迅速加快,目前已经有数千个人类数据集的公共数据达到了数十 TB 之多。对这些公共数据集进行系统的重新分析,已经构建了一个包含 210 万个前体的、针对 19000 多种蛋白质(包括合成肽谱)的 100 多万个独特序列的社区规模的谱库。然而,要找到并检查覆盖功能蛋白区域或匹配新型蛋白质的肽段仍然具有挑战性。ProteinExplorer 通过直观的界面,将数千万个鉴定结果映射到几乎所有人类蛋白质的功能位点上,同时保持每个鉴定结果回溯到原始数据集和数据文件的出处,解决了这些挑战。此外,ProteinExplorer 还方便了选择和检查符合 HPP 标准的肽段,这些肽段的谱可以与合成肽段的谱相匹配,并且已经包含了 107 个缺失(PE2、PE3 和 PE4)和 23 个可疑(PE5)蛋白质的 HPP 证据。最后,ProteinExplorer 允许用户对谱进行评分,并为一个名为 PrEdict(蛋白质存在字典)的新型蛋白质肽段社区库做出贡献,这些肽段的初步身份尚未通过社区规模的假发现率和合成肽谱完全确定。现在可以通过 https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp 访问 ProteinExplorer。