Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany.
Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany.
Nat Protoc. 2020 Oct;15(10):3212-3239. doi: 10.1038/s41596-020-0368-7. Epub 2020 Aug 28.
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
代谢蛋白质组学是研究多生物系统的集体蛋白质组成的学科,它深入了解微生物群落的生物多样性以及微生物与其宿主或环境之间的复杂功能相互作用。因此,代谢蛋白质组学已成为微生物学等各个领域以及相关医学应用中不可或缺的工具。与纯培养蛋白质组学相比,相应数据集分析中的计算挑战有所不同,例如,由于样品的复杂性更高,以及需要特定计算管道的更大参考数据库,因此要求更高。相应的数据分析通常由许多必须密切同步的手动步骤组成。我们使用 MetaProteomeAnalyzer 和 Prophane 建立了两个专门为代谢蛋白质组学开发和优化的开源软件解决方案。除了其他功能外,通过结合不同的搜索引擎,肽谱匹配得到了改善,与类似的工具相比,代谢蛋白质组注释受益于最全面的可用数据库集(如 NCBI、UniProt、EggNOG、PFAM 和 CAZy)。本协议中描述的工作流程结合了这两个工具,并引导用户完成整个数据分析过程,包括蛋白质数据库创建、数据库搜索、蛋白质分组和注释以及结果可视化。据我们所知,本协议是迄今为止针对代谢蛋白质组学数据分析最全面、最详细和最灵活的指南。初学者可以在合理的时间内(根据蛋白质数据库大小以及鉴定的肽和推断的蛋白质数量等因素,需要几个小时)获得强大、易于使用、最先进的数据分析,而高级用户则受益于工作流程的灵活性和适应性。