State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Research Unit of Proteomics & Research and Development of New Drug of Chinese Academy of Medical Sciences, Beijing Institute of Lifeomics, Beijing 102206, P. R. China.
Center for Neurodegenerative Diseases, Emory Proteomics Service Center, and Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia 30322, United States.
J Proteome Res. 2021 Feb 5;20(2):1328-1340. doi: 10.1021/acs.jproteome.0c00721. Epub 2021 Jan 14.
Proteomics approaches designed to catalogue all open reading frames (ORFs) under a defined set of growth conditions of an organism have flourished in recent years. However, no proteome has been sequenced completely so far. Here, we generate the largest yeast proteome data set, including 5610 identified proteins, using a strategy based on optimized sample preparation and high-resolution mass spectrometry. Among the 5610 identified proteins, 94.1% are core proteins, which achieves near-complete coverage of the yeast ORFs. Comprehensive analysis of missing proteins showed that proteins are missed mainly due to physical properties. A review of protein abundance shows that our proteome encompasses a uniquely broad dynamic range. Additionally, these values highly correlate with mRNA abundance, implying a high level of accuracy, sensitivity, and precision. We present examples of how the data could be used, including reannotating gene localization, providing expression evidence of pseudogenes. Our near-complete yeast proteome data set will be a useful and important resource for further systematic studies.
近年来,旨在编目特定生长条件下生物体所有开放阅读框 (ORFs) 的蛋白质组学方法蓬勃发展。然而,到目前为止,还没有一个蛋白质组被完全测序。在这里,我们使用基于优化的样品制备和高分辨率质谱的策略,生成了最大的酵母蛋白质组数据集,其中包含 5610 个已鉴定的蛋白质。在 5610 个已鉴定的蛋白质中,94.1%是核心蛋白质,几乎涵盖了酵母 ORFs 的全部内容。对缺失蛋白质的综合分析表明,蛋白质主要是由于物理性质而丢失的。对蛋白质丰度的回顾表明,我们的蛋白质组涵盖了独特的广泛动态范围。此外,这些值与 mRNA 丰度高度相关,这意味着具有很高的准确性、灵敏度和精密度。我们展示了如何使用这些数据的示例,包括重新注释基因定位,提供假基因表达证据。我们近乎完整的酵母蛋白质组数据集将成为进一步系统研究的有用且重要的资源。