长读长扩增子的聚类提高了对微生物组数据的系统发育洞察力。

clustering of long-read amplicons improves phylogenetic insight into microbiome data.

作者信息

Hui Yan, Sandris Nielsen Dennis, Krych Lukasz

机构信息

Department of Preventive Medicine, School of Public Health and Nursing, Hangzhou Normal University, Hangzhou, China.

Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark.

出版信息

Gut Microbes. 2025 Dec;17(1):2516703. doi: 10.1080/19490976.2025.2516703. Epub 2025 Jun 11.

DOI:10.1080/19490976.2025.2516703

PMID:40497323

Abstract

Long-read amplicon profiling through read classification limits phylogenetic analysis of amplicons while community analysis of multicopy genes, relying on unique molecular identifier (UMI) corrections, often demands deep sequencing. To address this, we present a long amplicon consensus analysis (LACA) workflow employing multiple clustering approaches based on sequence dissimilarity. LACA controls the average error rate of corrected sequences below 1% for the Oxford Nanopore Technologies (ONT) R9.4.1 and ONT R10.3 data, 0.2% for ONT R10.4.1, and 0.1% for high-accuracy ONT Duplex and Pacific Biosciences (PacBio) circular consensus sequencing (CCS) data in both simulated 16S rRNA and real 16-23S rRNA amplicon datasets. In high-accuracy PacBio CCS data, the clustering-based correction matched UMI correction, while outperforming 4× UMI correction in noisy ONT R10.3 and R9.4.1 data. Notably, LACA preserved phylogenetic fidelity in long operational taxonomic units and enhanced microbiome-wide phenotype characterization for synthetic mock communities and human vaginal samples.

摘要

通过读段分类进行的长读长扩增子分析限制了扩增子的系统发育分析，而基于独特分子标识符（UMI）校正的多拷贝基因群落分析通常需要深度测序。为了解决这个问题，我们提出了一种长扩增子一致性分析（LACA）工作流程，该流程采用了基于序列差异的多种聚类方法。在模拟的16S rRNA和真实的16-23S rRNA扩增子数据集中，对于牛津纳米孔技术公司（ONT）的R9.4.1和ONT R10.3数据，LACA将校正序列的平均错误率控制在1%以下；对于ONT R10.4.1数据，错误率控制在0.2%以下；对于高精度的ONT双链和太平洋生物科学公司（PacBio）的环形一致性测序（CCS）数据，错误率控制在0.1%以下。在高精度的PacBio CCS数据中，基于聚类的校正与UMI校正相当，而在有噪声的ONT R10.3和R9.4.1数据中，其性能优于4倍UMI校正。值得注意的是，LACA在长操作分类单元中保持了系统发育保真度，并增强了对合成模拟群落和人类阴道样本的全微生物组表型特征描述。

相似文献

clustering of long-read amplicons improves phylogenetic insight into microbiome data.长读长扩增子的聚类提高了对微生物组数据的系统发育洞察力。

Gut Microbes. 2025 Dec;17(1):2516703. doi: 10.1080/19490976.2025.2516703. Epub 2025 Jun 11.

The newest Oxford Nanopore R10.4.1 full-length 16S rRNA sequencing enables the accurate resolution of species-level microbial community profiling.最新的牛津纳米孔 R10.4.1 全长 16S rRNA 测序可实现精确解析物种水平的微生物群落组成。

Appl Environ Microbiol. 2023 Oct 31;89(10):e0060523. doi: 10.1128/aem.00605-23. Epub 2023 Oct 6.

High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing.使用独特分子标识符结合纳米孔或PacBio测序的高精度长读长扩增子序列。

Nat Methods. 2021 Feb;18(2):165-169. doi: 10.1038/s41592-020-01041-y. Epub 2021 Jan 11.

Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes.采用 Pacific Biosciences 全长 16S rRNA 基因测序技术对健康鼻窦微生物组进行细菌群落物种水平分析。

Microbiome. 2018 Oct 23;6(1):190. doi: 10.1186/s40168-018-0569-2.

High accuracy meets high throughput for near full-length 16S ribosomal RNA amplicon sequencing on the Nanopore platform.在纳米孔平台上进行近全长16S核糖体RNA扩增子测序时，高精度与高通量得以兼顾。

PNAS Nexus. 2024 Oct 9;3(10):pgae411. doi: 10.1093/pnasnexus/pgae411. eCollection 2024 Oct.

Closing the gap: Oxford Nanopore Technologies R10 sequencing allows comparable results to Illumina sequencing for SNP-based outbreak investigation of bacterial pathogens.缩小差距：牛津纳米孔技术 R10 测序能够与 Illumina 测序相媲美，可用于基于 SNP 的细菌病原体暴发调查。

J Clin Microbiol. 2024 May 8;62(5):e0157623. doi: 10.1128/jcm.01576-23. Epub 2024 Mar 5.

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.利用长读长16S rRNA基因扩增子测序和通用层次聚类改进操作分类单元（OTU）挑选

Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.

Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers.使用两种下一代测序技术评估16S rRNA扩增子测序用于肉牛瘤胃细菌群落的系统发育分析。

J Microbiol Methods. 2016 Aug;127:132-140. doi: 10.1016/j.mimet.2016.06.004. Epub 2016 Jun 6.

EnsembleSeq: a workflow towards real-time, rapid, and simultaneous multi-kingdom-amplicon sequencing for holistic and resource-effective microbiome research at scale.EnsembleSeq：一种实时、快速、同时进行多菌群扩增子测序的工作流程，用于大规模进行整体且资源有效的微生物组研究。

Microbiol Spectr. 2024 Jun 4;12(6):e0415023. doi: 10.1128/spectrum.04150-23. Epub 2024 Apr 30.

Evaluating the efficiency of 16S-ITS-23S operon sequencing for species level resolution in microbial communities.评估16S-ITS-23S操纵子测序在微生物群落物种水平分辨率方面的效率。

Sci Rep. 2025 Jan 22;15(1):2822. doi: 10.1038/s41598-024-83410-7.

本文引用的文献

MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification.MIMt：一个经过整理的16S rRNA参考数据库，在物种水平鉴定上具有更低的冗余度和更高的准确性。

Environ Microbiome. 2024 Nov 9;19(1):88. doi: 10.1186/s40793-024-00634-w.

Appl Environ Microbiol. 2023 Oct 31;89(10):e0060523. doi: 10.1128/aem.00605-23. Epub 2023 Oct 6.

Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing.牛津纳米孔 R10.4 长读测序能够从纯培养物和宏基因组中生成近乎完成的细菌基因组，而无需进行短读测序或参考序列优化。

Nat Methods. 2022 Jul;19(7):823-826. doi: 10.1038/s41592-022-01539-7. Epub 2022 Jul 4.

Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data.鸸鹋：全长 16S rRNA Oxford Nanopore 测序数据的种水平微生物群落分析。

Nat Methods. 2022 Jul;19(7):845-853. doi: 10.1038/s41592-022-01520-4. Epub 2022 Jun 30.

MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores.MeShClust v3.0：使用均值漂移算法和无比对身份分数对 DNA 序列进行高质量聚类。

BMC Genomics. 2022 Jun 6;23(1):423. doi: 10.1186/s12864-022-08619-0.

The Statistics of -mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches.无伪匹配情况下简单突变过程中序列的 -mers 统计。

J Comput Biol. 2022 Feb;29(2):155-168. doi: 10.1089/cmb.2021.0431. Epub 2022 Feb 1.

Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。

Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.

RESCRIPt: Reproducible sequence taxonomy reference database management.RESCIPT：可重复序列分类法参考数据库管理。

PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581. eCollection 2021 Nov.

Supplementation of a lacto-fermented rapeseed-seaweed blend promotes gut microbial- and gut immune-modulation in weaner piglets.补充乳酸发酵的油菜籽-海藻混合物可促进断奶仔猪的肠道微生物调节和肠道免疫调节。

J Anim Sci Biotechnol. 2021 Jul 20;12(1):85. doi: 10.1186/s40104-021-00601-2.

Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。

F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

长读长扩增子的聚类提高了对微生物组数据的系统发育洞察力。

clustering of long-read amplicons improves phylogenetic insight into microbiome data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献