• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DMSC:一种将16S rRNA序列聚类为操作分类单元的动态多种子方法。

DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.

作者信息

Wei Ze-Gang, Zhang Shao-Wu

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China.

Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Science, Baoji, China.

出版信息

Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.

DOI:10.3389/fmicb.2019.00428
PMID:30915052
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6422886/
Abstract

Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the -core sequences ( ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC.

摘要

通过联合使用PCR扩增和二代测序(NGS)技术进行的基于二代测序的16S rRNA测序是一种经济高效的技术,已成功用于研究来自复杂微生物群落或环境的样本的系统发育和分类学。将16S rRNA序列聚类为操作分类单元(OTU)通常是许多下游分析的第一步。启发式聚类是生成OTU最广泛使用的方法之一。然而,大多数启发式OTU聚类方法只选择一个单一的种子序列来代表每个聚类,导致其结果要么高估了OTU数量,要么对测序错误敏感。在本文中,我们提出了一种新颖的动态多种子聚类方法(即DMSC)来挑选OTU。DMSC首先根据距离阈值启发式地生成聚类。当一个聚类的大小达到预定义的最小大小时,DMSC选择多核序列(MCS)作为种子,这些种子被定义为 - 核心序列(≥3),其中任意两个序列之间的距离小于距离阈值。根据到MCS的平均距离和MCS内的距离标准差将新序列分配到相应的聚类中。如果将新序列添加到聚类中,则动态更新MCS,直到没有序列合并到该聚类中。新方法DMSC在几个模拟和实际序列数据集上进行了测试,并与传统的启发式方法如CD-HIT、UCLUST和DBH进行了比较。根据推断的OTU数量、归一化互信息(NMI)和马修相关系数(MCC)指标的实验结果表明,DMSC可以以低内存使用量产生更高质量的聚类,并减少OTU高估。此外,DMSC对测序错误也具有鲁棒性。DMSC软件可从https://github.com/NWPU-903PR/DMSC免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/0e8247611c3f/fmicb-10-00428-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/a01dbbd91584/fmicb-10-00428-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/078ba9fda3d4/fmicb-10-00428-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/f5ce51f0c1e6/fmicb-10-00428-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/6b13af67b872/fmicb-10-00428-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/62547bf48837/fmicb-10-00428-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/b802e35cbb32/fmicb-10-00428-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/15dd7f12eebd/fmicb-10-00428-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/68bc11089a2b/fmicb-10-00428-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/08b6cc5d83ca/fmicb-10-00428-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/0e8247611c3f/fmicb-10-00428-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/a01dbbd91584/fmicb-10-00428-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/078ba9fda3d4/fmicb-10-00428-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/f5ce51f0c1e6/fmicb-10-00428-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/6b13af67b872/fmicb-10-00428-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/62547bf48837/fmicb-10-00428-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/b802e35cbb32/fmicb-10-00428-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/15dd7f12eebd/fmicb-10-00428-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/68bc11089a2b/fmicb-10-00428-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/08b6cc5d83ca/fmicb-10-00428-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/685c/6422886/0e8247611c3f/fmicb-10-00428-g010.jpg

相似文献

1
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.DMSC:一种将16S rRNA序列聚类为操作分类单元的动态多种子方法。
Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.
2
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.DBH:一种基于德布鲁因图的启发式方法,用于将大规模16S rRNA序列聚类为操作分类单元。
J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.
3
MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.MtHc:一种基于基序的层次化方法,用于将大量16S rRNA序列聚类为操作分类单元。
Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k.
4
MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.MSClust:一种基于多种子的微生物组 profiling 聚类算法,使用 16S rRNA 序列。
J Microbiol Methods. 2013 Sep;94(3):347-55. doi: 10.1016/j.mimet.2013.07.004. Epub 2013 Jul 28.
5
bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.生物OTU:一种用于16S rRNA基因序列分类分配和操作分类单元聚类的改进方法。
J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.
6
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units.在将16S rRNA基因序列分配到操作分类单元方面,从头聚类方法优于基于参考的方法。
PeerJ. 2015 Dec 8;3:e1487. doi: 10.7717/peerj.1487. eCollection 2015.
7
Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.利用长读长16S rRNA基因扩增子测序和通用层次聚类改进操作分类单元(OTU)挑选
Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.
8
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.DMclust,一种基于密度的 OTU 聚类方法,用于准确提取 16S rRNA 序列。
Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.
9
DNACLUST: accurate and efficient clustering of phylogenetic marker genes.DNACLUST:准确高效的系统发育标记基因聚类
BMC Bioinformatics. 2011 Jun 30;12:271. doi: 10.1186/1471-2105-12-271.
10
Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.评估和改进基于操作分类单元的 16S rRNA 基因序列分析方法。
Appl Environ Microbiol. 2011 May;77(10):3219-26. doi: 10.1128/AEM.02810-10. Epub 2011 Mar 18.

引用本文的文献

1
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
2
An Immobilized Form of a Blend of Essential Oils Improves the Density of Beneficial Bacteria, in Addition to Suppressing Pathogens in the Gut and Also Improves the Performance of Chicken Breeding.一种固定化形式的精油混合物,除了能抑制肠道中的病原体外,还能提高有益细菌的密度,并且改善肉鸡养殖性能。
Microorganisms. 2023 Jul 31;11(8):1960. doi: 10.3390/microorganisms11081960.
3
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.

本文引用的文献

1
Taxonomy annotation and guide tree errors in 16S rRNA databases.16S rRNA数据库中的分类注释和引导树错误。
PeerJ. 2018 Jun 12;6:e5030. doi: 10.7717/peerj.5030. eCollection 2018.
2
NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.NPBSS:一种新的 PacBio 测序模拟器,用于基于经验模型生成连续的长读长。
BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.
3
Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling.
从扩增子序列中挑选操作分类单元的方法比较
Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.
4
Gut Microbial Composition Differs Extensively among Indian Native Chicken Breeds Originated in Different Geographical Locations and a Commercial Broiler Line, but Breed-Specific, as Well as Across-Breed Core Microbiomes, Are Found.起源于不同地理位置的印度本土鸡品种与一个商业肉鸡品系之间的肠道微生物组成存在广泛差异,但也发现了特定品种以及跨品种的核心微生物群。
Microorganisms. 2021 Feb 14;9(2):391. doi: 10.3390/microorganisms9020391.
5
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.smsMap:通过定位比对起始位置来对单分子测序reads 进行映射。
BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.
结合 16S rRNA 基因可变区可实现高分辨率微生物群落分析。
Microbiome. 2018 Jan 26;6(1):17. doi: 10.1186/s40168-017-0396-x.
4
Classifier Fusion With Contextual Reliability Evaluation.分类器融合与上下文可靠性评估。
IEEE Trans Cybern. 2018 May;48(5):1605-1618. doi: 10.1109/TCYB.2017.2710205. Epub 2017 Jun 8.
5
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.DMclust,一种基于密度的 OTU 聚类方法,用于准确提取 16S rRNA 序列。
Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.
6
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.DBH:一种基于德布鲁因图的启发式方法,用于将大规模16S rRNA序列聚类为操作分类单元。
J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.
7
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.
8
Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns.Deblur能快速解析单核苷酸群落序列模式。
mSystems. 2017 Mar 7;2(2). doi: 10.1128/mSystems.00191-16. eCollection 2017 Mar-Apr.
9
OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units.OptiClust,一种将基于扩增子的序列数据分配到操作分类单元的改进方法。
mSphere. 2017 Mar 8;2(2). doi: 10.1128/mSphereDirect.00073-17. eCollection 2017 Mar-Apr.
10
Application of a Database-Independent Approach To Assess the Quality of Operational Taxonomic Unit Picking Methods.一种独立于数据库的方法在评估操作分类单元划分方法质量中的应用。
mSystems. 2016 Apr 26;1(2). doi: 10.1128/mSystems.00027-16. eCollection 2016 Mar-Apr.