Rajczewski Andrew T, Mehta Subina, Wagner Reid, Gabriel Wassim, Johnson James, Do Katherine, Vintila Simina, Wilhelm Mathias, Kleiner Manuel, Searle Brian C, Griffin Timothy J, Jagtap Pratik D
University of Minnesota, Minneapolis, MN.
Computational Mass Spectrometry, Technical University of Munich, Freising, Germany.
bioRxiv. 2025 May 20:2025.05.15.654320. doi: 10.1101/2025.05.15.654320.
Mass spectrometry-based metaproteomics, the identification and quantification of thousands of proteins expressed by complex microbial communities, has become pivotal for unraveling functional interactions within microbiomes. However, metaproteomics data analysis encounters many challenges, including the search of tandem mass spectra against a protein sequence database using proteomics database search algorithms. We used a ground-truth dataset to assess a spectral library searching method against established database searching approaches. Mass spectrometry data collected by data-dependent acquisition (DDA-MS) was analyzed using database searching approaches (MaxQuant and FragPipe), as well as using Scribe with Prosit predicted spectral libraries. We used FASTA databases that included protein sequences from microbial species present in the ground-truth dataset along with background protein sequences, to estimate error rates and assess the effects on detection, peptide-spectral match quality, and quantification. Using the Scribe search engine resulted in more proteins detected at a 1% false discovery rate (FDR) compared to MaxQuant or FragPipe, while FragPipe detected more peptides verified by PepQuery. Scribe was able to detect more low-abundance proteins in the microbiome dataset and was more accurate in quantifying the microbial community composition. This research provides insights and guidance for metaproteomics researchers aiming to optimize results in their analysis of DDA-MS data.
基于质谱的宏蛋白质组学,即对复杂微生物群落表达的数千种蛋白质进行鉴定和定量,已成为揭示微生物组内功能相互作用的关键。然而,宏蛋白质组学数据分析面临许多挑战,包括使用蛋白质组学数据库搜索算法在蛋白质序列数据库中搜索串联质谱。我们使用了一个真实数据集,以评估一种光谱库搜索方法与既定的数据库搜索方法。通过数据依赖采集(DDA-MS)收集的质谱数据使用数据库搜索方法(MaxQuant和FragPipe)进行分析,以及使用带有Prosit预测光谱库的Scribe进行分析。我们使用了FASTA数据库,其中包括真实数据集中存在的微生物物种的蛋白质序列以及背景蛋白质序列,以估计错误率并评估对检测、肽-光谱匹配质量和定量的影响。与MaxQuant或FragPipe相比,使用Scribe搜索引擎在1%的错误发现率(FDR)下检测到更多蛋白质,而FragPipe检测到更多经PepQuery验证的肽段。Scribe能够在微生物组数据集中检测到更多低丰度蛋白质,并且在定量微生物群落组成方面更准确。这项研究为旨在优化DDA-MS数据分析结果的宏蛋白质组学研究人员提供了见解和指导。