• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

宏基因组学中用于准确蛋白质序列数据库构建的 Contigs 定向基因注释(ConDiGA)。

Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics.

机构信息

Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai, 200000, China.

School of Computing, College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia.

出版信息

Microbiome. 2024 Mar 19;12(1):58. doi: 10.1186/s40168-024-01775-3.

DOI:10.1186/s40168-024-01775-3
PMID:38504332
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10949615/
Abstract

BACKGROUND

Microbiota are closely associated with human health and disease. Metaproteomics can provide a direct means to identify microbial proteins in microbiota for compositional and functional characterization. However, in-depth and accurate metaproteomics is still limited due to the extreme complexity and high diversity of microbiota samples. It is generally recommended to use metagenomic data from the same samples to construct the protein sequence database for metaproteomic data analysis. Although different metagenomics-based database construction strategies have been developed, an optimization of gene taxonomic annotation has not been reported, which, however, is extremely important for accurate metaproteomic analysis.

RESULTS

Herein, we proposed an accurate taxonomic annotation pipeline for genes from metagenomic data, namely contigs directed gene annotation (ConDiGA), and used the method to build a protein sequence database for metaproteomic analysis. We compared our pipeline (ConDiGA or MD3) with two other popular annotation pipelines (MD1 and MD2). In MD1, genes were directly annotated against the whole bacterial genome database; in MD2, contigs were annotated against the whole bacterial genome database and the taxonomic information of contigs was assigned to the genes; in MD3, the most confident species from the contigs annotation results were taken as reference to annotate genes. Annotation tools, including BLAST, Kaiju, and Kraken2, were compared. Based on a synthetic microbial community of 12 species, it was found that Kaiju with the MD3 pipeline outperformed the others in the construction of protein sequence database from metagenomic data. Similar performance was also observed with a fecal sample, as well as in silico mixed datasets of the simulated microbial community and the fecal sample.

CONCLUSIONS

Overall, we developed an optimized pipeline for gene taxonomic annotation to construct protein sequence databases. Our study can tackle the current taxonomic annotation reliability problem in metagenomics-derived protein sequence database and can promote the in-depth metaproteomic analysis of microbiome. The unique metagenomic and metaproteomic datasets of the 12 bacterial species are publicly available as a standard benchmarking sample for evaluating various analysis pipelines. The code of ConDiGA is open access at GitHub for the analysis of microbiota samples. Video Abstract.

摘要

背景

微生物群与人类健康和疾病密切相关。代谢蛋白质组学可以提供一种直接的方法来识别微生物群中的微生物蛋白,从而进行组成和功能表征。然而,由于微生物群样本的极端复杂性和高度多样性,深入和准确的代谢蛋白质组学仍然受到限制。通常建议使用来自相同样本的宏基因组数据来构建用于代谢蛋白质组数据分析的蛋白质序列数据库。尽管已经开发了不同的基于宏基因组的数据库构建策略,但尚未报道基因分类注释的优化,然而,这对于准确的代谢蛋白质组分析至关重要。

结果

本文提出了一种用于宏基因组数据中基因的准确分类注释管道,即基于 contigs 的基因注释(ConDiGA),并使用该方法构建了用于代谢蛋白质组分析的蛋白质序列数据库。我们将我们的管道(ConDiGA 或 MD3)与另外两种流行的注释管道(MD1 和 MD2)进行了比较。在 MD1 中,基因直接针对整个细菌基因组数据库进行注释;在 MD2 中,contigs 针对整个细菌基因组数据库进行注释,并且 contigs 的分类信息被分配给基因;在 MD3 中,从 contigs 注释结果中最可信的物种被用作注释基因的参考。比较了注释工具,包括 BLAST、Kaiju 和 Kraken2。基于 12 个物种的合成微生物群落,发现使用 MD3 管道的 Kaiju 在从宏基因组数据构建蛋白质序列数据库方面优于其他方法。同样的性能也在粪便样本以及模拟微生物群落和粪便样本的混合数据集上得到了观察。

结论

总之,我们开发了一种优化的基因分类注释管道,用于构建蛋白质序列数据库。我们的研究可以解决宏基因组衍生蛋白质序列数据库中当前的分类注释可靠性问题,并促进微生物组的深入代谢蛋白质组学分析。12 个细菌物种的独特宏基因组和代谢蛋白质组学数据集作为评估各种分析管道的标准基准样本公开可用。ConDiGA 的代码在 GitHub 上可公开获取,用于分析微生物群样本。视频摘要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/e4990202e64a/40168_2024_1775_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/8e8ce591e1fc/40168_2024_1775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/a6a59014d74a/40168_2024_1775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/52f68d8ced64/40168_2024_1775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/f6e284ae1a1b/40168_2024_1775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/e4990202e64a/40168_2024_1775_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/8e8ce591e1fc/40168_2024_1775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/a6a59014d74a/40168_2024_1775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/52f68d8ced64/40168_2024_1775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/f6e284ae1a1b/40168_2024_1775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0951/10949615/e4990202e64a/40168_2024_1775_Fig5_HTML.jpg

相似文献

1
Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics.宏基因组学中用于准确蛋白质序列数据库构建的 Contigs 定向基因注释(ConDiGA)。
Microbiome. 2024 Mar 19;12(1):58. doi: 10.1186/s40168-024-01775-3.
2
MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets.MetaNovo:用于复杂宏蛋白质组学数据中概率肽发现的开源管道。
PLoS Comput Biol. 2023 Jun 16;19(6):e1011163. doi: 10.1371/journal.pcbi.1011163. eCollection 2023 Jun.
3
MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning.MetaCluster-TA:基于组装辅助分箱的宏基因组数据分类注释。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-15-S1-S12. Epub 2014 Jan 24.
4
A community-supported metaproteomic pipeline for improving peptide identifications in hydrothermal vent microbiota.一种社区支持的宏蛋白质组学管道,用于提高热液喷口微生物群中的肽鉴定。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab052.
5
[Microbial metaproteomics--From sample processing to data acquisition and analysis].[微生物元蛋白质组学——从样品处理到数据采集与分析]
Se Pu. 2024 Jul;42(7):658-668. doi: 10.3724/SP.J.1123.2024.02009.
6
Optimizing metaproteomics database construction: lessons from a study of the vaginal microbiome.优化宏蛋白质组学数据库构建:来自阴道微生物组研究的经验教训。
mSystems. 2023 Aug 31;8(4):e0067822. doi: 10.1128/msystems.00678-22. Epub 2023 Jun 23.
7
The impact of sequence database choice on metaproteomic results in gut microbiota studies.序列数据库选择对肠道微生物群研究中宏蛋白质组学结果的影响。
Microbiome. 2016 Sep 27;4(1):51. doi: 10.1186/s40168-016-0196-8.
8
Database selection for shotgun metaproteomic of low-diversity dairy microbiomes.用于低多样性乳微生物组鸟枪法宏蛋白质组学的数据库选择。
Int J Food Microbiol. 2024 Jun 16;418:110706. doi: 10.1016/j.ijfoodmicro.2024.110706. Epub 2024 Apr 15.
9
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.基于宏基因组分类学的数据库检索策略可提高宏蛋白质组分析的效果。
J Proteome Res. 2018 Apr 6;17(4):1596-1605. doi: 10.1021/acs.jproteome.7b00894. Epub 2018 Feb 26.
10
MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics.元真核生物敏感、高通量的基因发现和注释,用于大规模真核生物宏基因组学。
Microbiome. 2020 Apr 3;8(1):48. doi: 10.1186/s40168-020-00808-x.

引用本文的文献

1
Modeling bacterial interactions uncovers the importance of outliers in the coastal lignin-degrading consortium.对细菌相互作用进行建模揭示了沿海木质素降解菌群中异常值的重要性。
Nat Commun. 2025 Jan 14;16(1):639. doi: 10.1038/s41467-025-56012-8.
2
NovoLign: metaproteomics by sequence alignment.NovoLign:通过序列比对进行宏蛋白质组学分析。
ISME Commun. 2024 Oct 12;4(1):ycae121. doi: 10.1093/ismeco/ycae121. eCollection 2024 Jan.
3
[Microbial metaproteomics--From sample processing to data acquisition and analysis].[微生物元蛋白质组学——从样品处理到数据采集与分析]

本文引用的文献

1
Metaproteomics of the human gut microbiota: Challenges and contributions to other OMICS.人类肠道微生物群的宏蛋白质组学:挑战及对其他组学的贡献
Clin Mass Spectrom. 2019 Jun 4;14 Pt A:18-30. doi: 10.1016/j.clinms.2019.06.001. eCollection 2019 Sep.
2
Increasing the power of interpretation for soil metaproteomics data.提高土壤宏蛋白质组学数据的解读能力。
Microbiome. 2021 Sep 29;9(1):195. doi: 10.1186/s40168-021-01139-1.
3
MAPLE: A Microbiome Analysis Pipeline Enabling Optimal Peptide Search and Comparative Taxonomic and Functional Analysis.
Se Pu. 2024 Jul;42(7):658-668. doi: 10.3724/SP.J.1123.2024.02009.
MAPLE:一种微生物组分析管道,可实现最佳肽搜索以及分类和功能比较分析。
J Proteome Res. 2021 May 7;20(5):2882-2894. doi: 10.1021/acs.jproteome.1c00114. Epub 2021 Apr 13.
4
Using high-abundance proteins as guides for fast and effective peptide/protein identification from human gut metaproteomic data.利用高丰度蛋白质作为人类肠道宏蛋白质组学数据中快速有效肽/蛋白质鉴定的向导。
Microbiome. 2021 Apr 1;9(1):80. doi: 10.1186/s40168-021-01035-8.
5
Benefits of Iterative Searches of Large Databases to Interpret Large Human Gut Metaproteomic Data Sets.从大型数据库中进行迭代搜索以解释大型人类肠道宏蛋白质组数据集的优势。
J Proteome Res. 2021 Mar 5;20(3):1522-1534. doi: 10.1021/acs.jproteome.0c00669. Epub 2021 Feb 2.
6
UniProt: the universal protein knowledgebase in 2021.UniProt:2021 年的通用蛋白质知识库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.
7
Metaproteomics characterizes human gut microbiome function in colorectal cancer.宏蛋白质组学描绘了结直肠癌患者肠道微生物组的功能。
NPJ Biofilms Microbiomes. 2020 Mar 24;6(1):14. doi: 10.1038/s41522-020-0123-4.
8
Metaproteomics: A strategy to study the taxonomy and functionality of the gut microbiota.代谢蛋白质组学:一种研究肠道微生物群落分类学和功能的策略。
J Proteomics. 2020 May 15;219:103737. doi: 10.1016/j.jprot.2020.103737. Epub 2020 Mar 18.
9
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
10
Perspective and Guidelines for Metaproteomics in Microbiome Studies.宏蛋白质组学在微生物组研究中的观点和指南。
J Proteome Res. 2019 Jun 7;18(6):2370-2380. doi: 10.1021/acs.jproteome.9b00054. Epub 2019 Apr 26.