• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对靶向宏基因组学的常见及新兴生物信息学流程评估

Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics.

作者信息

Siegwald Léa, Touzet Hélène, Lemoine Yves, Hot David, Audebert Christophe, Caboche Ségolène

机构信息

Gènes Diffusion, Douai, France.

CRIStAL (UMR CNRS 9189 University of Lille, Centre de Recherche en Informatique, Signal et Automatique de Lille) and Inria, Villeneuve d'Ascq, France.

出版信息

PLoS One. 2017 Jan 4;12(1):e0169563. doi: 10.1371/journal.pone.0169563. eCollection 2017.

DOI:10.1371/journal.pone.0169563
PMID:28052134
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5215245/
Abstract

Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described.

摘要

靶向宏基因组学,也被称为宏遗传学,是一种高通量测序应用,专注于微生物组中的核苷酸靶标以描述其分类内容。有多种生物信息学流程可用于分析测序输出结果,选择合适的工具至关重要且并非易事。目前不存在用于估计靶向宏基因组学分析流程准确性的标准评估方法。本文提出了一种评估方案,其中包含真实和模拟的靶向宏基因组学数据集,以及适当的指标,使我们能够研究不同变量对结果生物学解释的影响。该方案用于在基本用户环境中比较六种不同的生物信息学流程:三种基于先聚类方法的常用流程( mothur、QIIME 和 BMP)以及三种使用先分配方法的新兴流程(Kraken、CLARK 和 One Codex)。这项研究惊人地发现,测序错误的影响对结果的影响比对选择不同扩增区域的影响更大。此外,增加测序通量会增加丰度高估,对于高复杂性微生物群更是如此。最后,参考数据库的选择对基于先聚类流程的丰度估计影响更大,而对基于先分配流程的正确分类群鉴定影响更大。使用新兴的先分配流程是靶向宏基因组学分析的一种有效方法,其结果质量与流行的先聚类流程相当,即使使用像 Ion Torrent 这样容易出错的测序技术也是如此。然而,这些流程对数据库及其注释的质量高度敏感,这使得先聚类流程仍然是研究描述不充分的微生物组的唯一可靠方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/992b0adf0319/pone.0169563.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/44bfcd3a597a/pone.0169563.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/f10c298b33ab/pone.0169563.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/86181476d0e5/pone.0169563.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/141f10f0309b/pone.0169563.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/6b118372a789/pone.0169563.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/8a3cf1844b8a/pone.0169563.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/a46898321e7a/pone.0169563.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/08cfb370de98/pone.0169563.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/f488d8b0b20f/pone.0169563.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/992b0adf0319/pone.0169563.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/44bfcd3a597a/pone.0169563.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/f10c298b33ab/pone.0169563.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/86181476d0e5/pone.0169563.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/141f10f0309b/pone.0169563.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/6b118372a789/pone.0169563.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/8a3cf1844b8a/pone.0169563.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/a46898321e7a/pone.0169563.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/08cfb370de98/pone.0169563.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/f488d8b0b20f/pone.0169563.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be7/5215245/992b0adf0319/pone.0169563.g010.jpg

相似文献

1
Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics.针对靶向宏基因组学的常见及新兴生物信息学流程评估
PLoS One. 2017 Jan 4;12(1):e0169563. doi: 10.1371/journal.pone.0169563. eCollection 2017.
2
The Impact of Bioinformatics Pipelines on Microbiota Studies: Does the Analytical "Microscope" Affect the Biological Interpretation?生物信息学流程对微生物群研究的影响:分析“显微镜”是否会影响生物学解释?
Microorganisms. 2019 Sep 26;7(10):393. doi: 10.3390/microorganisms7100393.
3
From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data.从读取到可操作分类单元:用于MiSeq扩增子测序数据的集成处理流程
Gigascience. 2017 Feb 1;6(2):1-10. doi: 10.1093/gigascience/giw017.
4
[A review on the bioinformatics pipelines for metagenomic research].[宏基因组学研究的生物信息学流程综述]
Dongwuxue Yanjiu. 2012 Dec;33(6):574-85. doi: 10.3724/SP.J.1141.2012.06574.
5
Web Resources for Metagenomics Studies.宏基因组学研究的网络资源
Genomics Proteomics Bioinformatics. 2015 Oct;13(5):296-303. doi: 10.1016/j.gpb.2015.10.003. Epub 2015 Nov 18.
6
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.可扩展宏基因组比对研究工具(SMART):一种用于对复杂序列群体中的宏基因组序列进行分类的可扩展、快速且完整的搜索启发式方法。
BMC Bioinformatics. 2016 Jul 28;17:292. doi: 10.1186/s12859-016-1159-6.
7
BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies.BMPOS:用于微生物组研究的灵活且用户友好的工具集。
Microb Ecol. 2016 Aug;72(2):443-7. doi: 10.1007/s00248-016-0785-x. Epub 2016 May 24.
8
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.MICCA:一款用于宏基因组数据分类分析的完整且准确的软件。
Sci Rep. 2015 May 19;5:9743. doi: 10.1038/srep09743.
9
High throughput sequencing methods and analysis for microbiome research.高通量测序方法及其在微生物组研究中的分析。
J Microbiol Methods. 2013 Dec;95(3):401-14. doi: 10.1016/j.mimet.2013.08.011. Epub 2013 Sep 9.
10
A critical analysis of state-of-the-art metagenomics OTU clustering algorithms.对最先进的宏基因组 OTU 聚类算法的批判性分析。
J Biosci. 2019 Dec;44(6).

引用本文的文献

1
Systematic Review: The Relationship Between the Faecal Microbiome and Colorectal Neoplasia in Shotgun Metagenomic Studies.系统评价:鸟枪法宏基因组学研究中粪便微生物群与结直肠癌的关系
Aliment Pharmacol Ther. 2025 Sep;62(6):568-584. doi: 10.1111/apt.70252. Epub 2025 Aug 12.
2
The choice of 16S rRNA gene sequence analysis impacted characterization of highly variable surface microbiota in dairy processing environments.16S rRNA 基因序列分析的选择影响了乳品加工环境中高度可变表面微生物群落的特征分析。
mSystems. 2024 Nov 19;9(11):e0062024. doi: 10.1128/msystems.00620-24. Epub 2024 Oct 21.
3
A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets.

本文引用的文献

1
Open-Source Sequence Clustering Methods Improve the State Of the Art.开源序列聚类方法提升了现有技术水平。
mSystems. 2016 Feb 9;1(1). doi: 10.1128/mSystems.00003-15. eCollection 2016 Jan-Feb.
2
An evaluation of the accuracy and speed of metagenome analysis tools.宏基因组分析工具的准确性和速度评估。
Sci Rep. 2016 Jan 18;6:19233. doi: 10.1038/srep19233.
3
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units.在将16S rRNA基因序列分配到操作分类单元方面,从头聚类方法优于基于参考的方法。
一种基于快速系统发育的方法,可准确描绘大规模 metabarcoding 数据集的群落组成。
Elife. 2024 Aug 15;13:e85794. doi: 10.7554/eLife.85794.
4
ARGem: a new metagenomics pipeline for antibiotic resistance genes: metadata, analysis, and visualization.ARGem:一种用于抗生素抗性基因的新宏基因组学流程:元数据、分析与可视化
Front Genet. 2023 Sep 15;14:1219297. doi: 10.3389/fgene.2023.1219297. eCollection 2023.
5
Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective.转化神经科学中的整合多组学与系统生物信息学:数据挖掘视角
J Pharm Anal. 2023 Aug;13(8):836-850. doi: 10.1016/j.jpha.2023.06.011. Epub 2023 Jun 30.
6
Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold.对偏倚进行排名:在 16S rRNA 扩增子数据分析中,OTUs 与 ASVs 的选择对多样性测量的影响大于稀疏和 OTU 同一性阈值。
PLoS One. 2022 Feb 24;17(2):e0264443. doi: 10.1371/journal.pone.0264443. eCollection 2022.
7
Parvimonas micra, Peptostreptococcus stomatis, Fusobacterium nucleatum and Akkermansia muciniphila as a four-bacteria biomarker panel of colorectal cancer.微小消化链球菌、口腔普氏菌、具核梭杆菌和黏蛋白阿克曼菌作为结直肠癌的四重细菌生物标志物组合。
Sci Rep. 2021 Feb 3;11(1):2925. doi: 10.1038/s41598-021-82465-0.
8
Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline.环境微生物群落研究的解读因所选16S rRNA(基因)扩增子测序流程而存在偏差。
Front Microbiol. 2020 Oct 23;11:550420. doi: 10.3389/fmicb.2020.550420. eCollection 2020.
9
16S rRNA Gene Copy Number Normalization Does Not Provide More Reliable Conclusions in Metataxonomic Surveys.16S rRNA 基因拷贝数归一化在分类学调查中不能提供更可靠的结论。
Microb Ecol. 2021 Feb;81(2):535-539. doi: 10.1007/s00248-020-01586-7. Epub 2020 Aug 29.
10
A Comparison of Two DNA Metagenomic Bioinformatic Pipelines While Evaluating the Microbial Diversity in Feces of Tanzanian Small Holder Dairy Cattle.两种 DNA 宏基因组生物信息学分析流程在评估坦桑尼亚小农户奶牛粪便微生物多样性中的比较。
Biomed Res Int. 2020 Apr 22;2020:2348560. doi: 10.1155/2020/2348560. eCollection 2020.
PeerJ. 2015 Dec 8;3:e1487. doi: 10.7717/peerj.1487. eCollection 2015.
4
Spaced seeds improve k-mer-based metagenomic classification.间隔种子可改善基于k-mer的宏基因组分类。
Bioinformatics. 2015 Nov 15;31(22):3584-92. doi: 10.1093/bioinformatics/btv419. Epub 2015 Jul 25.
5
The ocean sampling day consortium.海洋采样日联盟
Gigascience. 2015 Jun 19;4:27. doi: 10.1186/s13742-015-0066-5. eCollection 2015.
6
Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正
Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.
7
NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.NoDe:一种用于焦磷酸测序扩增子读数的快速纠错算法。
BMC Bioinformatics. 2015 Mar 15;16(1):88. doi: 10.1186/s12859-015-0520-5.
8
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.克拉克:使用判别性k-mer对宏基因组和基因组序列进行快速准确分类
BMC Genomics. 2015 Mar 25;16(1):236. doi: 10.1186/s12864-015-1419-2.
9
Microbial community composition and diversity via 16S rRNA gene amplicons: evaluating the illumina platform.通过16S rRNA基因扩增子分析微生物群落组成和多样性:评估Illumina平台
PLoS One. 2015 Feb 3;10(2):e0116955. doi: 10.1371/journal.pone.0116955. eCollection 2015.
10
Swarm: robust and fast clustering method for amplicon-based studies.Swarm:一种基于扩增子的快速稳健聚类方法。
PeerJ. 2014 Sep 25;2:e593. doi: 10.7717/peerj.593. eCollection 2014.