拼接成对的末端读取可提高微生物群落分析中扩增子分类的分类学分类。

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities.

机构信息

Department of Biological Sciences, University of Massachusetts Lowell, Lowell, MA, USA.

出版信息

BMC Bioinformatics. 2021 Oct 12;22(1):493. doi: 10.1186/s12859-021-04410-2.

DOI:10.1186/s12859-021-04410-2

PMID:34641782

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8507205/

Abstract

BACKGROUND

Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases.

RESULTS

The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes.

CONCLUSIONS

Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.

摘要

背景

微生物组分析中遗传标记的分类学分类受到从样品制备到生物信息学分析的众多选择的影响。当读取端重叠时，通常使用配对末端读取合并来捕获整个扩增子序列。然而，将未合并的读取排除在进一步分析之外可能会导致对测序微生物群落多样性的低估，并受到生物信息学过程的影响，例如读取修剪和参考数据库的选择。克服这一问题的一种潜在解决方案是连接（合并）不重叠的读取并将其保留用于分类学分类。使用拼接读取可以提高从单端读取中进行分类学恢复的性能，但尚不清楚它们的性能与合并读取相比如何。我们使用不同的扩增子、读取长度、读取深度、分类组成和序列质量的各种测序模拟群落，测试了在不同参考数据库中使用不同参数进行读取修剪和分类学分类的生物信息学管道中，合并和拼接读取在属级召回率和精度方面的性能如何。

结果

添加拼接读取总是会增加管道性能。性能排名前两位的管道都包含读取拼接，具体取决于模拟群落，其强度有所不同。对于具有较大扩增子和较高平均质量序列的模拟群落，组合使用经质量修剪的合并和拼接读取的管道表现最佳。对于具有较低质量序列的模拟群落，使用长度修剪的拼接读取的管道在长度修剪方面表现优于质量修剪，但在处理过程中，用于分类学分类的输入序列大量丢失。与 Greengenes 相比，使用 SILVA 参考数据库进行属级分类更准确。

结论

添加无法合并的拼接序列的合并序列提高了分类学分类的性能。这在具有较大扩增子的模拟群落中尤为有益。我们首次使用包含合并和拼接读取的管道的深入比较，结合不同的修剪参数和参考数据库，展示了在微生物组研究中拼接序列提高分辨率的潜在优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a932/8507205/0f442446508c/12859_2021_4410_Fig1_HTML.jpg

相似文献

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities.

BMC Bioinformatics. 2021 Oct 12;22(1):493. doi: 10.1186/s12859-021-04410-2.

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing.

mSphere. 2021 Feb 24;6(1):e01202-20. doi: 10.1128/mSphere.01202-20.

Multi-amplicon microbiome data analysis pipelines for mixed orientation sequences using QIIME2: Assessing reference database, variable region and pre-processing bias in classification of mock bacterial community samples.

PLoS One. 2023 Jan 13;18(1):e0280293. doi: 10.1371/journal.pone.0280293. eCollection 2023.

Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data.

Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.

CDSnake: Snakemake pipeline for retrieval of annotated OTUs from paired-end reads using CD-HIT utilities.

BMC Bioinformatics. 2020 Jul 24;21(Suppl 12):303. doi: 10.1186/s12859-020-03591-6.

rpoB, a promising marker for analyzing the diversity of bacterial communities by amplicon sequencing.

BMC Microbiol. 2019 Jul 29;19(1):171. doi: 10.1186/s12866-019-1546-z.

Species-level resolution for the vaginal microbiota with short amplicons.

mSystems. 2024 Feb 20;9(2):e0103923. doi: 10.1128/msystems.01039-23. Epub 2024 Jan 26.

Mock microbial community meta-analysis using different trimming of amplicon read lengths.

Environ Microbiol. 2024 Jan;26(1):e16566. doi: 10.1111/1462-2920.16566. Epub 2023 Dec 27.

Impact of DNA Sequencing and Analysis Methods on 16S rRNA Gene Bacterial Community Analysis of Dairy Products.

mSphere. 2018 Oct 17;3(5):e00410-18. doi: 10.1128/mSphere.00410-18.

GSR-DB: a manually curated and optimized taxonomical database for 16S rRNA amplicon analysis.

mSystems. 2024 Feb 20;9(2):e0095023. doi: 10.1128/msystems.00950-23. Epub 2024 Jan 8.

引用本文的文献

Investigating fungal diversity through metabarcoding for environmental samples: assessment of ITS1 and ITS2 Illumina sequencing using multiple defined mock communities with different classification methods and reference databases.

BMC Genomics. 2025 Aug 6;26(1):729. doi: 10.1186/s12864-025-11917-y.

Refining microbiome diversity analysis by concatenating and integrating dual 16S rRNA amplicon reads.

NPJ Biofilms Microbiomes. 2025 Apr 12;11(1):57. doi: 10.1038/s41522-025-00686-x.

Unraveling the impact of marine heatwaves on the Eukaryome of the emblematic Mediterranean red coral .

ISME Commun. 2025 Feb 21;5(1):ycaf035. doi: 10.1093/ismeco/ycaf035. eCollection 2025 Jan.

Untrimmed ITS2 metabarcode sequences cause artificially reduced abundances of specific fungal taxa.

Appl Environ Microbiol. 2025 Jan 31;91(1):e0153724. doi: 10.1128/aem.01537-24. Epub 2024 Dec 26.

Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data.

Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.

Do fish gut microbiotas vary across spatial scales? A case study of Diplodus vulgaris in the Mediterranean Sea.

Anim Microbiome. 2024 Jun 13;6(1):32. doi: 10.1186/s42523-024-00319-2.

The microbiota of Amblyomma americanum reflects known westward expansion.

PLoS One. 2024 Jun 10;19(6):e0304959. doi: 10.1371/journal.pone.0304959. eCollection 2024.

Higher abundance of Campylobacter in the oral microbiome of Japanese patients with moyamoya disease.

Sci Rep. 2023 Oct 29;13(1):18545. doi: 10.1038/s41598-023-45755-3.

A study of microbial diversity in a biofertilizer consortium.

PLoS One. 2023 Aug 24;18(8):e0286285. doi: 10.1371/journal.pone.0286285. eCollection 2023.

Robust cross-cohort gut microbiome associations with COVID-19 severity.

Gut Microbes. 2023 Jan-Dec;15(1):2242615. doi: 10.1080/19490976.2023.2242615.

本文引用的文献

Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.

Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.

: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results.

Front Genet. 2020 Nov 20;11:489357. doi: 10.3389/fgene.2020.489357. eCollection 2020.

Multicenter assessment of microbial community profiling using 16S rRNA gene sequencing and shotgun metagenomic sequencing.

J Adv Res. 2020 Jul 21;26:111-121. doi: 10.1016/j.jare.2020.07.010. eCollection 2020 Nov.

Joining Illumina paired-end reads for classifying phylogenetic marker sequences.

BMC Bioinformatics. 2020 Mar 14;21(1):105. doi: 10.1186/s12859-020-3445-6.

Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing.

PLoS One. 2020 Jan 16;15(1):e0227434. doi: 10.1371/journal.pone.0227434. eCollection 2020.

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Nat Biotechnol. 2019 Aug;37(8):852-857. doi: 10.1038/s41587-019-0209-9.

Index hopping on the Illumina HiseqX platform and its consequences for ancient DNA studies.

Mol Ecol Resour. 2020 Sep;20(5):1171-1181. doi: 10.1111/1755-0998.13009. Epub 2019 May 5.

Performance of Microbiome Sequence Inference Methods in Environments with Varying Biomass.

mSystems. 2019 Feb 19;4(1). doi: 10.1128/mSystems.00163-18. eCollection 2019 Jan-Feb.

Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences.

Front Microbiol. 2018 Dec 13;9:3010. doi: 10.3389/fmicb.2018.03010. eCollection 2018.

Impact of DNA extraction method and targeted 16S-rRNA hypervariable region on oral microbiota profiling.

Sci Rep. 2018 Nov 5;8(1):16321. doi: 10.1038/s41598-018-34294-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

拼接成对的末端读取可提高微生物群落分析中扩增子分类的分类学分类。

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities.

机构信息

Department of Biological Sciences, University of Massachusetts Lowell, Lowell, MA, USA.