• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OGRE:基于重叠图的宏基因组读聚类。

OGRE: Overlap Graph-based metagenomic Read clustEring.

机构信息

Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.

Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.

出版信息

Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.

DOI:10.1093/bioinformatics/btaa760
PMID:32871010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8128468/
Abstract

MOTIVATION

The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes.

RESULTS

We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity.

CONCLUSION

OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues.

AVAILABILITYAND IMPLEMENTATION

Code is made available on Github (https://github.com/Marleen1/OGRE).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生活在环境中的微生物可以通过组合基因组物质(也称为宏基因组)来识别。对宏基因组进行测序可能会产生大量的测序reads。一种很有前途的方法是通过基于重叠将reads 聚类成组来减小宏基因组数据集的大小。对 reads 进行聚类对于促进下游分析很有价值,包括计算密集型的基于菌株的组装。由于当前的 read 聚类方法无法处理来自高通量宏基因组测序的大型数据集,因此需要一种新的 read 聚类方法。在本文中,我们提出了 OGRE,这是一种基于重叠图的高通量测序数据 read 聚类程序,重点是shotgun 宏基因组。

结果

我们表明,对于小数据集,OGRE 在聚类中包含的物种数量(也称为聚类纯度)和所有reads 中被放置在一个聚类中的部分方面优于其他 read 分类器。此外,OGRE 能够将其他 read 分类器无法处理的大型宏基因组数据集处理成具有高聚类纯度的聚类。

结论

OGRE 是唯一一种能够在没有遇到计算时间或内存问题的情况下成功地将 reads 聚类到特定物种聚类中的方法,适用于大型宏基因组数据集。

可用性和实现

代码可在 Github(https://github.com/Marleen1/OGRE)上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/2ebacf019c9d/btaa760f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/9799fa41dafd/btaa760f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/8c1e18dfe453/btaa760f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/f16c4ba7f918/btaa760f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/a440d6ea9e59/btaa760f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/037cd8b40f6c/btaa760f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/c324d9c6c067/btaa760f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/2ebacf019c9d/btaa760f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/9799fa41dafd/btaa760f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/8c1e18dfe453/btaa760f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/f16c4ba7f918/btaa760f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/a440d6ea9e59/btaa760f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/037cd8b40f6c/btaa760f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/c324d9c6c067/btaa760f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505e/8128468/2ebacf019c9d/btaa760f7.jpg

相似文献

1
OGRE: Overlap Graph-based metagenomic Read clustEring.OGRE:基于重叠图的宏基因组读聚类。
Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.
2
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.MetaCAA:一种用于宏基因组数据集高效组装的聚类辅助方法。
Genomics. 2014 Feb-Mar;103(2-3):161-8. doi: 10.1016/j.ygeno.2014.02.007. Epub 2014 Mar 5.
3
Metagenomic binning with assembly graph embeddings.基于组装图嵌入的宏基因组 bin 划分。
Bioinformatics. 2022 Sep 30;38(19):4481-4487. doi: 10.1093/bioinformatics/btac557.
4
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
5
Estimating the composition of species in metagenomes by clustering of next-generation read sequences.通过对新一代测序读段序列进行聚类来估计宏基因组中物种的组成。
Methods. 2014 Oct 1;69(3):213-9. doi: 10.1016/j.ymeth.2014.07.009. Epub 2014 Jul 27.
6
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
7
SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing.半Bin2:自监督对比学习可提高短读长读测序的宏基因组组装质量。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i21-i29. doi: 10.1093/bioinformatics/btad209.
8
MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.MATAM:从宏基因组的短测序读长中重建系统发育标记基因。
Bioinformatics. 2018 Feb 15;34(4):585-591. doi: 10.1093/bioinformatics/btx644.
9
MetaBCC-LR: metagenomics binning by coverage and composition for long reads.MetaBCC-LR:基于覆盖度和组成的长读长宏基因组 bin 划分。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i3-i11. doi: 10.1093/bioinformatics/btaa441.
10
SolidBin: improving metagenome binning with semi-supervised normalized cut.SolidBin:利用半监督归一化割提高宏基因组 bin 划分。
Bioinformatics. 2019 Nov 1;35(21):4229-4238. doi: 10.1093/bioinformatics/btz253.

引用本文的文献

1
ARGContextProfiler: extracting and scoring the genomic contexts of antibiotic resistance genes using assembly graphs.ARG上下文分析器:利用组装图提取抗生素抗性基因的基因组上下文并进行评分。
Front Microbiol. 2025 May 21;16:1604461. doi: 10.3389/fmicb.2025.1604461. eCollection 2025.
2
MetaComBin: combining abundances and overlaps for binning metagenomics reads.MetaComBin:结合丰度和重叠以对宏基因组reads进行分箱
Front Bioinform. 2025 Mar 3;5:1504728. doi: 10.3389/fbinf.2025.1504728. eCollection 2025.
3
Prokrustean Graph: A substring index for rapid k-mer size analysis.

本文引用的文献

1
Overlap graph-based generation of haplotigs for diploids and polyploids.基于重叠图的二倍体和多倍体单倍型生成。
Bioinformatics. 2019 Nov 1;35(21):4281-4289. doi: 10.1093/bioinformatics/btz255.
2
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
3
Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.宏基因组解读的批判性评估——宏基因组学软件的一项基准测试
普罗克汝斯忒斯图:一种用于快速k-mer大小分析的子串索引。
bioRxiv. 2024 Dec 20:2023.11.21.568151. doi: 10.1101/2023.11.21.568151.
4
Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity.通过组装具有长程连接性的短读长来探索高质量的微生物基因组。
Nat Commun. 2024 May 31;15(1):4631. doi: 10.1038/s41467-024-49060-z.
5
StrainXpress: strain aware metagenome assembly from short reads.StrainXpress:基于短读长的菌株感知宏基因组组装。
Nucleic Acids Res. 2022 Sep 23;50(17):e101. doi: 10.1093/nar/gkac543.
Nat Methods. 2017 Nov;14(11):1063-1071. doi: 10.1038/nmeth.4458. Epub 2017 Oct 2.
4
De novo assembly of viral quasispecies using overlap graphs.使用重叠图对病毒准种进行从头组装。
Genome Res. 2017 May;27(5):835-848. doi: 10.1101/gr.215038.116. Epub 2017 Apr 10.
5
Snowball: strain aware gene assembly of metagenomes.雪球:宏基因组的菌株感知基因组装
Bioinformatics. 2016 Sep 1;32(17):i649-i657. doi: 10.1093/bioinformatics/btw426.
6
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
7
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.Minimap和miniasm:用于有噪声长序列的快速映射和从头组装。
Bioinformatics. 2016 Jul 15;32(14):2103-10. doi: 10.1093/bioinformatics/btw152. Epub 2016 Mar 19.
8
MetaQUAST: evaluation of metagenome assemblies.MetaQUAST:评估宏基因组组装。
Bioinformatics. 2016 Apr 1;32(7):1088-90. doi: 10.1093/bioinformatics/btv697. Epub 2015 Nov 26.
9
MBBC: an efficient approach for metagenomic binning based on clustering.MBBC:一种基于聚类的宏基因组分箱高效方法。
BMC Bioinformatics. 2015 Feb 5;16:36. doi: 10.1186/s12859-015-0473-8.
10
Tackling soil diversity with the assembly of large, complex metagenomes.利用大型复杂宏基因组组装来解决土壤多样性问题。
Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9. doi: 10.1073/pnas.1402564111. Epub 2014 Mar 14.