Crappé Jeroen, Ndah Elvis, Koch Alexander, Steyaert Sandra, Gawron Daria, De Keulenaer Sarah, De Meester Ellen, De Meyer Tim, Van Criekinge Wim, Van Damme Petra, Menschaert Gerben
Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium Department of Medical Protein Research, Flemish Institute of Biotechnology, Ghent, Belgium Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
Nucleic Acids Res. 2015 Mar 11;43(5):e29. doi: 10.1093/nar/gku1283. Epub 2014 Dec 15.
An increasing amount of studies integrate mRNA sequencing data into MS-based proteomics to complement the translation product search space. However, several factors, including extensive regulation of mRNA translation and the need for three- or six-frame-translation, impede the use of mRNA-seq data for the construction of a protein sequence search database. With that in mind, we developed the PROTEOFORMER tool that automatically processes data of the recently developed ribosome profiling method (sequencing of ribosome-protected mRNA fragments), resulting in genome-wide visualization of ribosome occupancy. Our tool also includes a translation initiation site calling algorithm allowing the delineation of the open reading frames (ORFs) of all translation products. A complete protein synthesis-based sequence database can thus be compiled for mass spectrometry-based identification. This approach increases the overall protein identification rates with 3% and 11% (improved and new identifications) for human and mouse, respectively, and enables proteome-wide detection of 5'-extended proteoforms, upstream ORF translation and near-cognate translation start sites. The PROTEOFORMER tool is available as a stand-alone pipeline and has been implemented in the galaxy framework for ease of use.
越来越多的研究将mRNA测序数据整合到基于质谱的蛋白质组学中,以补充翻译产物的搜索空间。然而,包括mRNA翻译的广泛调控以及三框架或六框架翻译的需求等几个因素,阻碍了使用mRNA测序数据构建蛋白质序列搜索数据库。考虑到这一点,我们开发了PROTEOFORMER工具,该工具可自动处理最近开发的核糖体谱分析方法(核糖体保护的mRNA片段测序)的数据,从而实现全基因组核糖体占据情况的可视化。我们的工具还包括一个翻译起始位点调用算法,可用于描绘所有翻译产物的开放阅读框(ORF)。因此,可以为基于质谱的鉴定编制一个完整的基于蛋白质合成的序列数据库。这种方法分别将人类和小鼠的整体蛋白质鉴定率提高了3%和11%(改进的和新的鉴定),并能够在全蛋白质组范围内检测5'端延伸的蛋白质异构体、上游开放阅读框翻译和近同源翻译起始位点。PROTEOFORMER工具作为一个独立的流程可用,并已在Galaxy框架中实现,以方便使用。