用于宏蛋白质组学中蛋白质数据库构建和高分辨率分类注释的人工培育基因组参考。

Cultivated genome references for protein database construction and high-resolution taxonomic annotation in metaproteomics.

作者信息

Gao Xiaowei, Liang Hewei, Hu Tongyuan, Zou Yuanqiang, Xiao Liang

机构信息

BGI Research, Shenzhen, China.

BGI Research, Wuhan, China.

出版信息

Microbiol Spectr. 2025 Feb 4;13(2):e0175524. doi: 10.1128/spectrum.01755-24. Epub 2024 Dec 12.

DOI:10.1128/spectrum.01755-24

PMID:39665565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11792528/

Abstract

Metaproteomics offers a profound understanding of the functional dynamics of the gut microbiome, which is crucial for personalized healthcare strategies. The selection of an appropriate database is a critical step for the identification of peptides and proteins, as well as for the provision of accurate taxonomic and functional annotations. The matched metagenomic-derived database is considered to be the best, but its limitations include the identification of low-abundance organisms and taxonomic resolution. Herein, we constructed a protein database (DBCGR2) based on Cultivated Genome Reference 2 (CGR2) and developed a complete peptide-centric analysis workflow for database searching and for the annotation of taxonomy and function. This workflow was subsequently appraised in comparison with metagenomics-derived databases for the analysis of metaproteomic data. Our findings suggested that the performance of DBCGR2 in identification was comparable with metagenomics-derived databases with improvement in identification rates of peptides from low-abundance species. The database searching results could be fully annotated using the pepTaxa taxonomic annotation approach developed in this study, and the taxonomic resolution was enhanced to strain level. Additionally, the results demonstrated that the sensitivity of functional annotation could be enhanced by employing DBCGR2. Overall, the DBCGR2 combined with pepTaxa can be considered an alternative for metaproteomic data analysis with superior analysis performances.IMPORTANCEMass spectrometry-based metaproteomics offers a profound understanding of the gut microbial taxonomy and functionality. The databases utilized in the analysis of metaproteomic data are crucial, as they determine the identification of proteins that can be recognized and linked to overall human health, in addition to the quality of taxonomic and functional annotation. Among the most effective approaches for constructing protein databases is the utilization of metagenomic sequencing to create matched databases. However, the database, derived from isolated genomes, has yet to undergo rigorous testing for their efficacy and accuracy in protein identification and taxonomic and functional annotation. Here, we constructed a protein database DBCGR2 derived from Cultivated Genome Reference 2 (CGR2) and a complete workflow for data analysis. We compared the performances of DBCGR2 and metagenomics-derived databases. Our results indicated that DBCGR2 can be regarded as an alternative to metagenomics-derived databases, which contribute to metaproteomic data analysis.

摘要

宏蛋白质组学有助于深入了解肠道微生物群的功能动态，这对个性化医疗保健策略至关重要。选择合适的数据库是鉴定肽和蛋白质以及提供准确的分类学和功能注释的关键步骤。匹配的宏基因组衍生数据库被认为是最佳选择，但其局限性包括低丰度生物的鉴定和分类分辨率。在此，我们基于培养基因组参考2（CGR2）构建了一个蛋白质数据库（DBCGR2），并开发了一个完整的以肽为中心的分析工作流程，用于数据库搜索以及分类学和功能注释。随后，将该工作流程与宏基因组衍生数据库进行比较，以评估宏蛋白质组学数据。我们的研究结果表明，DBCGR2在鉴定方面的性能与宏基因组衍生数据库相当，且低丰度物种肽的鉴定率有所提高。使用本研究中开发的pepTaxa分类注释方法可以对数据库搜索结果进行全面注释，分类分辨率提高到菌株水平。此外，结果表明使用DBCGR2可以提高功能注释的灵敏度。总体而言，DBCGR2与pepTaxa相结合可被视为宏蛋白质组学数据分析的替代方案，具有卓越的分析性能。

重要性

基于质谱的宏蛋白质组学有助于深入了解肠道微生物分类学和功能。用于宏蛋白质组学数据分析的数据库至关重要，因为它们不仅决定了可识别的蛋白质鉴定以及与人类整体健康的关联，还决定了分类学和功能注释的质量。构建蛋白质数据库最有效的方法之一是利用宏基因组测序创建匹配的数据库。然而，源自分离基因组的数据库在蛋白质鉴定以及分类学和功能注释方面的有效性和准确性尚未经过严格测试。在此，我们构建了一个源自培养基因组参考2（CGR2）的蛋白质数据库DBCGR2以及一个完整的数据分析工作流程。我们比较了DBCGR2和宏基因组衍生数据库的性能。我们的结果表明，DBCGR2可被视为宏基因组衍生数据库在宏蛋白质组学数据分析方面的替代方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于宏蛋白质组学中蛋白质数据库构建和高分辨率分类注释的人工培育基因组参考。

Cultivated genome references for protein database construction and high-resolution taxonomic annotation in metaproteomics.

作者信息

机构信息

出版信息

重要性

相似文献

本文引用的文献

用于宏蛋白质组学中蛋白质数据库构建和高分辨率分类注释的人工培育基因组参考。

Cultivated genome references for protein database construction and high-resolution taxonomic annotation in metaproteomics.

作者信息

机构信息

出版信息

重要性

相似文献

本文引用的文献