• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DBH:一种基于德布鲁因图的启发式方法,用于将大规模16S rRNA序列聚类为操作分类单元。

DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

作者信息

Wei Ze-Gang, Zhang Shao-Wu

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China.

出版信息

J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.

DOI:10.1016/j.jtbi.2017.04.019
PMID:28454900
Abstract

Recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many heuristic methods have been proposed for OTU inferences with low computational complexity, they just select one sequence as the seed for each cluster and the results are sensitive to the selected sequences that represent the clusters. To address this issue, we present a de Bruijn graph-based heuristic clustering method (DBH) for clustering massive 16S rRNA sequences into OTUs by introducing a novel seed selection strategy and greedy clustering approach. Compared with existing widely used methods on several simulated and real-life metagenomic datasets, the results show that DBH has higher clustering performance and low memory usage, facilitating the overestimation of OTUs number. DBH is more effective to handle large-scale metagenomic datasets. The DBH software can be freely downloaded from https://github.com/nwpu134/DBH.git for academic users.

摘要

由高通量技术推动的近期测序革命已导致微生物群落16S rRNA序列的快速积累。将短序列聚类为操作分类单元(OTU)是宏基因组数据分析中的一个关键初始过程。尽管已经提出了许多计算复杂度较低的启发式方法用于OTU推断,但它们只是为每个聚类选择一个序列作为种子,并且结果对代表聚类的所选序列敏感。为了解决这个问题,我们提出了一种基于德布鲁因图的启发式聚类方法(DBH),通过引入一种新颖的种子选择策略和贪婪聚类方法,将大量16S rRNA序列聚类为OTU。在几个模拟和实际宏基因组数据集上与现有的广泛使用的方法相比,结果表明DBH具有更高的聚类性能和低内存使用,有助于减少OTU数量的高估。DBH在处理大规模宏基因组数据集方面更有效。学术用户可从https://github.com/nwpu134/DBH.git免费下载DBH软件。

相似文献

1
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.DBH:一种基于德布鲁因图的启发式方法,用于将大规模16S rRNA序列聚类为操作分类单元。
J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.
2
MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.MtHc:一种基于基序的层次化方法,用于将大量16S rRNA序列聚类为操作分类单元。
Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k.
3
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.DMSC:一种将16S rRNA序列聚类为操作分类单元的动态多种子方法。
Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.
4
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.DMclust,一种基于密度的 OTU 聚类方法,用于准确提取 16S rRNA 序列。
Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.
5
Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.利用长读长16S rRNA基因扩增子测序和通用层次聚类改进操作分类单元(OTU)挑选
Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.
6
bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.生物OTU:一种用于16S rRNA基因序列分类分配和操作分类单元聚类的改进方法。
J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.
7
MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.MSClust:一种基于多种子的微生物组 profiling 聚类算法,使用 16S rRNA 序列。
J Microbiol Methods. 2013 Sep;94(3):347-55. doi: 10.1016/j.mimet.2013.07.004. Epub 2013 Jul 28.
8
A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data.一种基于扩增子序列数据的全新稳健聚类方法。
J Comput Biol. 2019 Jun;26(6):618-624. doi: 10.1089/cmb.2018.0170. Epub 2018 Dec 5.
9
Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences.Piphillin 可根据 DADA2 校正的 16S rDNA 序列预测宏基因组组成和动态。
BMC Genomics. 2020 Jan 17;21(1):56. doi: 10.1186/s12864-019-6427-1.
10
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.CLUSTOM-CLOUD:用于在云环境中对16S rRNA序列数据进行聚类的基于内存数据网格的软件。
PLoS One. 2016 Mar 8;11(3):e0151064. doi: 10.1371/journal.pone.0151064. eCollection 2016.

引用本文的文献

1
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.路径图:一种基于路径的长噪声读取高灵敏度映射工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
2
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.invMap:一种用于具有反转结构变体的长噪声读取的敏感映射工具。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.
3
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.从扩增子序列中挑选操作分类单元的方法比较
Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.
4
Metagenomic data of bacterial community from different land uses at the river basin, Kelantan.吉兰丹河流域不同土地利用类型下细菌群落的宏基因组数据。
Data Brief. 2020 Sep 28;33:106351. doi: 10.1016/j.dib.2020.106351. eCollection 2020 Dec.
5
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.smsMap:通过定位比对起始位置来对单分子测序reads 进行映射。
BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.
6
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.DMSC:一种将16S rRNA序列聚类为操作分类单元的动态多种子方法。
Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.
7
NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.NPBSS:一种新的 PacBio 测序模拟器,用于基于经验模型生成连续的长读长。
BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.