Kleikamp Hugo B C, van der Zwaan Ramon, van Valderen Ramon, van Ede Jitske M, Pronk Mario, Schaasberg Pim, Allaart Maximilienne T, van Loosdrecht Mark C M, Pabst Martin
Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands.
ISME Commun. 2024 Oct 12;4(1):ycae121. doi: 10.1093/ismeco/ycae121. eCollection 2024 Jan.
Tremendous advances in mass spectrometric and bioinformatic approaches have expanded proteomics into the field of microbial ecology. The commonly used spectral annotation method for metaproteomics data relies on database searching, which requires sample-specific databases obtained from whole metagenome sequencing experiments. However, creating these databases is complex, time-consuming, and prone to errors, potentially biasing experimental outcomes and conclusions. This asks for alternative approaches that can provide rapid and orthogonal insights into metaproteomics data. Here, we present NovoLign, a metaproteomics pipeline that performs sequence alignment of sequences from complete metaproteomics experiments. The pipeline enables rapid taxonomic profiling of complex communities and evaluates the taxonomic coverage of metaproteomics outcomes obtained from database searches. Furthermore, the NovoLign pipeline supports the creation of reference sequence databases for database searching to ensure comprehensive coverage. We assessed the NovoLign pipeline for taxonomic coverage and false positive annotations using a wide range of and experimental data, including pure reference strains, laboratory enrichment cultures, synthetic communities, and environmental microbial communities. In summary, we present NovoLign, a metaproteomics pipeline that employs large-scale sequence alignment to enable rapid taxonomic profiling, evaluation of database searching outcomes, and the creation of reference sequence databases. The NovoLign pipeline is publicly available via: https://github.com/hbckleikamp/NovoLign.
质谱分析和生物信息学方法取得的巨大进展已将蛋白质组学扩展到微生物生态学领域。宏蛋白质组学数据常用的谱图注释方法依赖于数据库搜索,这需要从全宏基因组测序实验中获取特定样本的数据库。然而,创建这些数据库复杂、耗时且容易出错,可能会使实验结果和结论产生偏差。这就需要能够对宏蛋白质组学数据提供快速且正交见解的替代方法。在此,我们介绍NovoLign,这是一个用于完整宏蛋白质组学实验序列比对的宏蛋白质组学流程。该流程能够对复杂群落进行快速分类分析,并评估从数据库搜索获得的宏蛋白质组学结果的分类覆盖范围。此外,NovoLign流程支持创建用于数据库搜索的参考序列数据库,以确保全面覆盖。我们使用包括纯参考菌株、实验室富集培养物、合成群落和环境微生物群落在内的广泛实验数据,评估了NovoLign流程的分类覆盖范围和假阳性注释。总之,我们介绍了NovoLign,这是一个采用大规模序列比对来实现快速分类分析、评估数据库搜索结果以及创建参考序列数据库的宏蛋白质组学流程。NovoLign流程可通过以下网址公开获取:https://github.com/hbckleikamp/NovoLign。