Wu Peng, Zhang Hongyu, Lin Weiran, Hao Yunwei, Ren Liangliang, Zhang Chengpu, Li Ning, Wei Handong, Jiang Ying, He Fuchu
State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , 33 Life Science Park Road, Beijing 102206, China.
J Proteome Res. 2014 May 2;13(5):2409-19. doi: 10.1021/pr4012206. Epub 2014 Apr 18.
Comprehensively identifying gene expression in both transcriptomic and proteomic levels of one tissue is a prerequisite for a deeper understanding of its biological functions. Alternative splicing and RNA editing, two main forms of transcriptional processing, play important roles in transcriptome and proteome diversity and result in multiple isoforms for one gene, which are hard to identify by mass spectrometry (MS)-based proteomics approach due to the relative lack of isoform information in standard protein databases. In our study, we employed MS and RNA-Seq in parallel into mouse liver tissue and captured a considerable catalogue of both transcripts and proteins that, respectively, covered 60 and 34% of protein-coding genes in Ensembl. We then developed a bioinformatics workflow for building a customized protein database that for the first time included new splicing-derived peptides and RNA-editing-caused peptide variants, allowing us to more completely identify protein isoforms. Using this experimentally determined database, we totally identified 150 peptides not present in standard biological databases at false discovery rate of <1%, corresponding to 72 novel splicing isoforms, 43 new genetic regions, and 15 RNA-editing sites. Of these, 11 randomly selected novel events passed experimental verification by PCR and Sanger sequencing. New discoveries of gene products with high confidence in two omics levels demonstrated the robustness and effectiveness of our approach and its potential application into improve genome annotation. All the MS data have been deposited to the iProx ( http://ww.iprox.org ) with the identifier IPX00003601.
全面识别一个组织在转录组和蛋白质组水平上的基因表达,是深入了解其生物学功能的先决条件。可变剪接和RNA编辑是转录加工的两种主要形式,在转录组和蛋白质组多样性中发挥重要作用,并导致一个基因产生多种异构体,由于标准蛋白质数据库中异构体信息相对缺乏,基于质谱(MS)的蛋白质组学方法很难识别这些异构体。在我们的研究中,我们将MS和RNA测序并行应用于小鼠肝脏组织,获得了大量的转录本和蛋白质目录,分别覆盖了Ensembl中60%和34%的蛋白质编码基因。然后,我们开发了一种生物信息学工作流程,用于构建一个定制的蛋白质数据库,该数据库首次包含了新的剪接衍生肽和RNA编辑导致的肽变体,使我们能够更全面地识别蛋白质异构体。使用这个通过实验确定的数据库,我们在错误发现率<1%的情况下,总共识别出150种标准生物数据库中不存在的肽,对应72种新的剪接异构体、43个新的基因区域和15个RNA编辑位点。其中,随机选择的11个新事件通过PCR和桑格测序进行了实验验证。在两个组学水平上对基因产物的高可信度新发现,证明了我们方法的稳健性和有效性及其在改进基因组注释方面的潜在应用。所有的MS数据已存入iProx(http://ww.iprox.org),标识符为IPX00003601。