Anhui Medical University , Hefei 230032 , China.
State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing) , Beijing Institute of Lifeomics , Beijing 102206 , China.
J Proteome Res. 2018 Jul 6;17(7):2335-2344. doi: 10.1021/acs.jproteome.8b00032. Epub 2018 Jun 25.
Microproteins are peptides composed of 100 amino acids (AA) or fewer, encoded by small open reading frames (smORFs). It has been demonstrated that microproteins participate in and regulate a wide range of functions in cells. However, the annotation and identification of microproteins is challenging in part owing to their low molecular weight, low abundancy, and hydrophobicity. These factors have led to the unannotation of smORFs in genome processing and have made their identification at the protein level difficult. Large-scale enrichment of microproteins in proteogenomics has made it possible to efficiently identify microproteins and discover unannotated smORFs in Saccharomyces cerevisiae. We integrated four microprotein-specific enrichment strategies to enhance coverage. We identified 117 microproteins, verified 31 missing proteins (MPs), and discovered 3 novel smORFs. In total, 31 proteins were confirmed as MPs by spectrum quality checking. Three novel smORFs (YKL104W-A, YHR052C-B, and YHR054C-B) were reserved after spectrum quality checking, peptide synthesizing, homologue matching, and so on. This study not only demonstrates that there are potential smORF candidates to be annotated in an extensively studied organism but also presents an efficient strategy for the discovery of small MPs. All MS data sets have been deposited to the ProteomeXchange with identifier PXD008586.
微蛋白是由 100 个氨基酸或更少组成的肽,由小开放阅读框 (smORF) 编码。已经证明微蛋白参与并调节细胞中的广泛功能。然而,微蛋白的注释和鉴定具有挑战性,部分原因是它们的分子量低、丰度低和疏水性。这些因素导致了在基因组处理中 smORF 的未注释,并使得在蛋白质水平上难以识别它们。在蛋白质基因组学中大规模富集微蛋白使得能够有效地鉴定微蛋白和发现酿酒酵母中未注释的 smORF。我们整合了四种微蛋白特异性富集策略以提高覆盖率。我们鉴定了 117 种微蛋白,验证了 31 种缺失蛋白 (MP),并发现了 3 个新的 smORF。总共通过谱质量检查确认了 31 种蛋白质为 MPs。经过谱质量检查、肽合成、同源匹配等步骤后,保留了 3 个新的 smORF (YKL104W-A、YHR052C-B 和 YHR054C-B)。这项研究不仅表明在广泛研究的生物体中存在潜在的 smORF 候选物有待注释,而且还提出了一种发现小 MPs 的有效策略。所有 MS 数据集都已存入 ProteomeXchange,标识符为 PXD008586。