School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN.
School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN.
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S183-S192. doi: 10.1074/mcp.TIR118.001233. Epub 2019 May 29.
Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.
匹配宏基因组和/或宏转录组数据(目前通常未被充分利用),可以作为宏蛋白质组串联质谱(MS/MS)数据分析的有用参考。在这里,我们开发了一种软件流程,用于使用来自匹配的宏基因组(和宏转录组)数据的蛋白质作为搜索数据库,从宏蛋白质组 MS/MS 数据中鉴定肽和蛋白质,该软件流程基于两种新方法 Graph2Pro(已发表)和 Var2Pep(新方法)。Graph2Pro 保留并利用宏基因组组装的不确定性,用于基于参考的 MS/MS 数据分析。Var2Pep 考虑了在宏基因组/宏转录组测序读取中发现的、但未保留在组装(片段)中的变异。新的软件流程提供了这两种工具的一站式应用,它支持使用常见的组装器(包括 MegaHit 和 metaSPAdes)生成的宏基因组组装。在对两个多组学微生物组数据集进行测试时,与传统的基于片段或基于读取的方法相比(仅 Var2Pep 就鉴定出 5.6%到 24.1%更多的独特肽,具体取决于数据集),我们的管道显著提高了宏蛋白质组 MS/MS 谱的鉴定率。我们还表明,鉴定出的变异肽对于微生物组的功能分析非常重要。所有结果均表明,考虑到组装不确定性和基因组变异对于促进宏蛋白质组 MS/MS 数据解释很重要。