Suppr超能文献

一种基于HmmUFOtu的简化流程,用于使用16S rRNA扩增子测序进行微生物群落分析。

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing.

作者信息

Kim Hyeonwoo, Kim Jiwon, Choi Ji Won, Ahn Kwang-Sung, Park Dong-Il, Kim Sangsoo

机构信息

Department of Bioinformatics, Soongsil University, Seoul 06978, Korea.

Department of Biological Sciences, Sungkyunkwan University, Suwon 16419, Korea.

出版信息

Genomics Inform. 2023 Sep;21(3):e40. doi: 10.5808/gi.23044. Epub 2023 Jul 31.

Abstract

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

摘要

使用16S rRNA扩增子测序进行微生物群落分析能够对多种微生物进行分类表征。虽然扩增子序列变体(ASV)方法因其对序列变体的高分辨率而越来越受到青睐,但它们在质量控制过程中常常会丢弃大量测序读数,尤其是在样本数量众多的数据集中。我们提出了一种简化流程,该流程集成了用于读段修剪的FastP、用于操作分类单元(OTU)聚类的HmmUFOtu、用于嵌合体检查的Vsearch以及用于分类归属的Kraken2。为了评估该流程的性能,我们重新处理了两个已发表的韩国正常人群粪便数据集:一个包含890个独立样本,另一个包含1462个独立样本。在第一个数据集中,经过质量修剪、去除嵌合或无法分类的读数后,HmmUFOtu保留了超过1.04亿对读数中的93.2%,而常用的ASV方法DADA2仅保留了44.6%的读数。尽管如此,两种方法生成的β多样性图在质量上相似。对于第二个数据集,HmmUFOtu保留了89.2%的读对,而DADA2仅保留了18.4%的读数。作为一种闭参考聚类方法,HmmUFOtu便于合并单独处理的数据集,两个数据集之间共享的OTU在总丰度(对数尺度)上的相关系数为0.92。虽然β多样性图的前两个维度显示两个数据集有凝聚性混合,但第三个维度揭示了批次效应的存在。我们在此简化流程中对ASV和OTU方法的比较评估为处理大规模微生物16S rRNA扩增子测序数据时它们的性能提供了有价值的见解。突出了HmmUFOtu的优势及其合并数据集的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e8d/10584646/407c27bf5f18/gi-23044f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验