• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MegaGTA:一种使用迭代德布鲁因图的灵敏且准确的宏基因组基因靶向组装器。

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs.

作者信息

Li Dinghua, Huang Yukun, Leung Chi-Ming, Luo Ruibang, Ting Hing-Fung, Lam Tak-Wah

机构信息

Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong.

L3 Bioinformatics Limited, Western District, Hong Kong.

出版信息

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):408. doi: 10.1186/s12859-017-1825-3.

DOI:10.1186/s12859-017-1825-3
PMID:29072142
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5657035/
Abstract

BACKGROUND

The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of de Bruijn graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of k-mers, which would have been an effective way to differentiate between erroneous k-mers and correct k-mers.

RESULTS

In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative de Bruijn graphs to take full advantage of multiple k-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct de Bruijn graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a de Bruijn graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of k-mers for building better HMM model. We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7-19.3% more contigs than Xander, and these contigs were assigned to 10-25% more gene references. In our experiments, MegaGTA, depending on the number of k-mers used, is two to ten times faster than Xander.

CONCLUSION

MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at https://github.com/HKU-BAL/megagta .

摘要

背景

基因靶向宏基因组组装工具Xander的最新发布表明,使用经过训练的隐马尔可夫模型(HMM)来指导德布鲁因图的遍历比其他组装方法具有明显优势。作为一项初步研究,Xander确实有很大的改进空间。除了速度慢之外,Xander在构建图时仅使用1个k-mer大小,而无论选择何种k值都会在灵敏度或准确性上有所折衷。Xander使用布隆过滤器来表示德布鲁因图以降低内存占用。布隆过滤器会引入误报,并且尚不清楚这将如何影响组装质量。Xander没有跟踪k-mer的多重性,而这本来是区分错误k-mer和正确k-mer的有效方法。

结果

在本文中,我们提出了一种新的基因靶向组装工具MegaGTA,它试图在不同方面改进Xander。在质量方面,它利用迭代德布鲁因图充分利用多个k-mer大小,以兼顾灵敏度和准确性。在计算方面,它采用简洁德布鲁因图(SdBG)来实现低内存占用和高速度(后者受益于用于构建SdBG的高效并行算法)。与布隆过滤器不同,SdBG是德布鲁因图的精确表示。这使得MegaGTA能够避免产生误报重叠群,并能轻松纳入k-mer的多重性以构建更好的HMM模型。我们在一个由人类微生物组计划(HMP)定义的模拟宏基因组数据集上对MegaGTA和Xander进行了比较,结果表明MegaGTA在灵敏度和准确性方面均表现出色。在一个大型根际土壤宏基因组样本(327Gbp)上,MegaGTA产生的重叠群比Xander多9.7 - 19.3%,并且这些重叠群被分配到的基因参考多10 - 25%。在我们的实验中,根据所使用的k-mer数量,MegaGTA比Xander快两到十倍。

结论

MegaGTA改进了Xander的算法,实现了更高的灵敏度、准确性和速度。此外,它能够从超大型宏基因组数据集中组装基因序列。其源代码可在https://github.com/HKU - BAL/megagta免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d30/5657035/67533cdc44a5/12859_2017_1825_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d30/5657035/32d852088445/12859_2017_1825_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d30/5657035/67533cdc44a5/12859_2017_1825_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d30/5657035/32d852088445/12859_2017_1825_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d30/5657035/67533cdc44a5/12859_2017_1825_Fig2_HTML.jpg

相似文献

1
MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs.MegaGTA:一种使用迭代德布鲁因图的灵敏且准确的宏基因组基因靶向组装器。
BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):408. doi: 10.1186/s12859-017-1825-3.
2
Xander: employing a novel method for efficient gene-targeted metagenomic assembly.赞德:采用一种新颖的方法实现高效的靶向宏基因组组装。
Microbiome. 2015 Aug 5;3:32. doi: 10.1186/s40168-015-0093-6. eCollection 2015.
3
MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.MEGAHIT v1.0:一种由先进方法和社区实践驱动的快速且可扩展的宏基因组组装工具。
Methods. 2016 Jun 1;102:3-11. doi: 10.1016/j.ymeth.2016.02.020. Epub 2016 Mar 21.
4
Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes.宏基因组生态分析的基因靶向组装综述、评估及方向
Front Genet. 2019 Oct 15;10:957. doi: 10.3389/fgene.2019.00957. eCollection 2019.
5
Evaluation of short read metagenomic assembly.短读宏基因组组装评估。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.
6
Inference of viral quasispecies with a paired de Bruijn graph.基于配对 de Bruijn 图的病毒准种推断。
Bioinformatics. 2021 May 1;37(4):473-481. doi: 10.1093/bioinformatics/btaa782.
7
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
8
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
9
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
10
GraphBin: refined binning of metagenomic contigs using assembly graphs.GraphBin:使用组装图对宏基因组序列进行精细化分箱。
Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

引用本文的文献

1
Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.利用精简的k-mer集从复杂宏基因组中高效地从头组装和恢复微生物基因组
Interdiscip Sci. 2025 Jun 2. doi: 10.1007/s12539-025-00722-6.
2
Diversity and transcription of genes involved in respiratory As(V) reduction and As(III) methylation in Japanese paddy soils.日本稻田土壤中参与呼吸态砷(V)还原和砷(III)甲基化的基因多样性与转录。
BMC Microbiol. 2024 Oct 9;24(1):396. doi: 10.1186/s12866-024-03562-4.
3
Applications of de Bruijn graphs in microbiome research.

本文引用的文献

1
MetaQUAST: evaluation of metagenome assemblies.MetaQUAST:评估宏基因组组装。
Bioinformatics. 2016 Apr 1;32(7):1088-90. doi: 10.1093/bioinformatics/btv697. Epub 2015 Nov 26.
2
Xander: employing a novel method for efficient gene-targeted metagenomic assembly.赞德:采用一种新颖的方法实现高效的靶向宏基因组组装。
Microbiome. 2015 Aug 5;3:32. doi: 10.1186/s40168-015-0093-6. eCollection 2015.
3
Reconstructing 16S rRNA genes in metagenomic data.重建宏基因组数据中的 16S rRNA 基因。
德布鲁因图在微生物组研究中的应用。
Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar.
4
kakapo: easy extraction and annotation of genes from raw RNA-seq reads.卡卡波鸟:从原始 RNA-seq 读取中轻松提取和注释基因。
PeerJ. 2023 Nov 27;11:e16456. doi: 10.7717/peerj.16456. eCollection 2023.
5
Genome-Guided Analysis of Seven Weed Species Reveals Conserved Sequence and Structural Features of Key Gene Targets for Herbicide Development.七种杂草物种的基因组导向分析揭示了除草剂开发关键基因靶点的保守序列和结构特征。
Front Plant Sci. 2022 Jun 29;13:909073. doi: 10.3389/fpls.2022.909073. eCollection 2022.
6
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.宏基因组学音乐——应用、分析流程及其相关工具的综述。
Funct Integr Genomics. 2022 Feb;22(1):3-26. doi: 10.1007/s10142-021-00810-y. Epub 2021 Oct 18.
7
Genome-resolved metagenomics using environmental and clinical samples.基于环境和临床样本的基因组解析宏基因组学。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab030.
8
ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data.ContigExtender:一种改进病毒宏基因组数据从头测序组装的新方法。
BMC Bioinformatics. 2021 Mar 12;22(1):119. doi: 10.1186/s12859-021-04038-2.
9
Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes.宏基因组生态分析的基因靶向组装综述、评估及方向
Front Genet. 2019 Oct 15;10:957. doi: 10.3389/fgene.2019.00957. eCollection 2019.
10
New approaches for metagenome assembly with short reads.基于短读长的宏基因组组装新方法
Brief Bioinform. 2020 Mar 23;21(2):584-594. doi: 10.1093/bib/bbz020.
Bioinformatics. 2015 Jun 15;31(12):i35-43. doi: 10.1093/bioinformatics/btv231.
4
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.MEGAHIT:通过简洁的 de Bruijn 图实现的超快速单节点解决方案,适用于大型和复杂的宏基因组组装。
Bioinformatics. 2015 May 15;31(10):1674-6. doi: 10.1093/bioinformatics/btv033. Epub 2015 Jan 20.
5
A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.一种用于下一代测序数据的可扩展且准确的靶向基因组装工具(SAT组装器)。
PLoS Comput Biol. 2014 Aug 14;10(8):e1003737. doi: 10.1371/journal.pcbi.1003737. eCollection 2014 Aug.
6
Trimmomatic: a flexible trimmer for Illumina sequence data.Trimmomatic:一款适用于 Illumina 测序数据的灵活修剪工具。
Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.
7
Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool.利用新的信息学工具 FrameBot 进行靶向宏基因组学研究,探索四个陆地气候带中 nifH 基因的生态模式。
mBio. 2013 Sep 17;4(5):e00592-13. doi: 10.1128/mBio.00592-13.
8
Space-efficient and exact de Bruijn graph representation based on a Bloom filter.基于布隆过滤器的空间高效且精确的德布鲁因图表示。
Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22.
9
Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions.同源性搜索的挑战:HMMER3 和卷曲螺旋区域的趋同进化。
Nucleic Acids Res. 2013 Jul;41(12):e121. doi: 10.1093/nar/gkt263. Epub 2013 Apr 17.
10
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.SOAPdenovo2:一种经验丰富的、内存效率高的短读长从头组装器。
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.