• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在低内存计算机上使用KrakenUniq进行宏基因组分类。

Metagenomic classification with KrakenUniq on low-memory computers.

作者信息

Pockrandt Christopher, Zimin Aleksey V, Salzberg Steven L

机构信息

Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.

出版信息

J Open Source Softw. 2022;7(80). doi: 10.21105/joss.04908. Epub 2022 Dec 28.

DOI:10.21105/joss.04908
PMID:37602140
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10438097/
Abstract

UNLABELLED

Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all from all genomes that the users want to be able to detect, where = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system.

STATEMENT OF NEED

The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies.Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+.

摘要

未标注

Kraken和KrakenUniq是用于宏基因组学序列分类的广泛使用的工具。这些系统的一个关键要求是一个数据库,该数据库包含用户想要能够检测到的所有基因组的所有 ,默认情况下 = 31。这个数据库可能非常大,很容易超过100千兆字节(GB),有时甚至达到400 GB。以前,Kraken和KrakenUniq需要将整个数据库加载到主内存(RAM)中,如果RAM不足,它们会使用内存映射,这会显著增加大型数据集的运行时间。我们在KrakenUniq中实现了一种新算法,使其能够分块加载和处理数据库,运行时间仅适度增加。这一改进现在使得在几乎任何计算机(甚至是笔记本电脑)上运行KrakenUniq处理非常大的数据集和巨大的数据库成为可能,同时提供与以前系统相同的非常高的分类准确性。

需求说明

KrakenUniq软件对宏基因组样本中的 reads 进行分类,以确定样本中存在哪些生物体并估计它们的丰度。该软件被医学诊断、微生物组和环境研究领域的研究人员和临床医生广泛使用。KrakenUniq使用的典型数据库大小从几十GB到数百GB不等。原始的KrakenUniq代码需要将整个数据库加载到RAM中,这需要昂贵的高内存服务器才能高效运行。如果用户没有足够的物理RAM来加载整个数据库,KrakenUniq会求助于对数据库进行内存映射,这会显著增加运行时间,通常会增加100倍以上。本文描述的新功能使无法使用高内存服务器的用户能够高效运行KrakenUniq,CPU时间性能提高了3到4倍,从100倍以上降至现在的水平。

相似文献

1
Metagenomic classification with KrakenUniq on low-memory computers.在低内存计算机上使用KrakenUniq进行宏基因组分类。
J Open Source Softw. 2022;7(80). doi: 10.21105/joss.04908. Epub 2022 Dec 28.
2
KrakenUniq: confident and fast metagenomics classification using unique k-mer counts.KrakenUniq:基于独特的 k-mer 计数实现自信且快速的宏基因组分类。
Genome Biol. 2018 Nov 16;19(1):198. doi: 10.1186/s13059-018-1568-0.
3
Use of a taxon-specific reference database for accurate metagenomics-based pathogen detection of Listeria monocytogenes in turkey deli meat and spinach.使用特定分类群的参考数据库进行基于宏基因组学的准确病原体检测,以检测火鸡肉熟食和菠菜中的单核细胞增生李斯特菌。
BMC Genomics. 2023 Jun 27;24(1):361. doi: 10.1186/s12864-023-09338-w.
4
HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data.HAYSTAC:一种用于高通量测序数据中稳健快速物种鉴定的贝叶斯框架。
PLoS Comput Biol. 2022 Sep 30;18(9):e1010493. doi: 10.1371/journal.pcbi.1010493. eCollection 2022 Sep.
5
Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.通过特征基因组划分检测宏基因组数据集中的低丰度细菌菌株。
Nat Biotechnol. 2015 Oct;33(10):1053-60. doi: 10.1038/nbt.3329. Epub 2015 Sep 14.
6
Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2.使用 Kraken 2 进行快速准确的 16S rRNA 微生物群落分析。
Microbiome. 2020 Aug 28;8(1):124. doi: 10.1186/s40168-020-00900-2.
7
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
8
MetaCache: context-aware classification of metagenomic reads using minhashing.MetaCache:基于 minhashing 的宏基因组读段上下文感知分类。
Bioinformatics. 2017 Dec 1;33(23):3740-3748. doi: 10.1093/bioinformatics/btx520.
9
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.可扩展宏基因组比对研究工具(SMART):一种用于对复杂序列群体中的宏基因组序列进行分类的可扩展、快速且完整的搜索启发式方法。
BMC Bioinformatics. 2016 Jul 28;17:292. doi: 10.1186/s12859-016-1159-6.
10
MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression.MetaCRAM:一种用于宏基因组分类识别和压缩的集成流程。
BMC Bioinformatics. 2016 Feb 19;17:94. doi: 10.1186/s12859-016-0932-x.

引用本文的文献

1
Evaluation of shotgun metagenomics as a diagnostic tool for infectious gastroenteritis.鸟枪法宏基因组学作为感染性肠胃炎诊断工具的评估
PLoS One. 2025 Sep 2;20(9):e0331288. doi: 10.1371/journal.pone.0331288. eCollection 2025.
2
Bone Adhered Sediments as a Source of Target and Environmental DNA and Proteins.作为目标DNA、环境DNA和蛋白质来源的骨附着沉积物
Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf202.
3
Taming large-scale genomic analyses via sparsified genomics.通过稀疏化基因组学实现大规模基因组分析的优化
Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.
4
Repeated plague infections across six generations of Neolithic Farmers.新石器时代农民六代人的反复瘟疫感染。
Nature. 2024 Aug;632(8023):114-121. doi: 10.1038/s41586-024-07651-2. Epub 2024 Jul 10.
5
aMeta: an accurate and memory-efficient ancient metagenomic profiling workflow.aMeta:一种准确且内存高效的古代宏基因组分析工作流程。
Genome Biol. 2023 Oct 23;24(1):242. doi: 10.1186/s13059-023-03083-9.
6
Use of a taxon-specific reference database for accurate metagenomics-based pathogen detection of Listeria monocytogenes in turkey deli meat and spinach.使用特定分类群的参考数据库进行基于宏基因组学的准确病原体检测,以检测火鸡肉熟食和菠菜中的单核细胞增生李斯特菌。
BMC Genomics. 2023 Jun 27;24(1):361. doi: 10.1186/s12864-023-09338-w.
7
An Adagio for Viruses, Played Out on Ancient DNA.一曲为病毒谱写的慢板,在古老的 DNA 中奏响。
Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad047.

本文引用的文献

1
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
2
KrakenUniq: confident and fast metagenomics classification using unique k-mer counts.KrakenUniq:基于独特的 k-mer 计数实现自信且快速的宏基因组分类。
Genome Biol. 2018 Nov 16;19(1):198. doi: 10.1186/s13059-018-1568-0.
3
Removing contaminants from databases of draft genomes.从基因组草案数据库中去除污染物。
PLoS Comput Biol. 2018 Jun 25;14(6):e1006277. doi: 10.1371/journal.pcbi.1006277. eCollection 2018 Jun.
4
GenBank.基因银行
Nucleic Acids Res. 2017 Jan 4;45(D1):D37-D42. doi: 10.1093/nar/gkw1070. Epub 2016 Nov 28.
5
Next-generation sequencing in neuropathologic diagnosis of infections of the nervous system.下一代测序在神经系统感染的神经病理学诊断中的应用。
Neurol Neuroimmunol Neuroinflamm. 2016 Jun 13;3(4):e251. doi: 10.1212/NXI.0000000000000251. eCollection 2016 Aug.
6
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
7
Reducing storage requirements for biological sequence comparison.减少生物序列比对的存储需求。
Bioinformatics. 2004 Dec 12;20(18):3363-9. doi: 10.1093/bioinformatics/bth408. Epub 2004 Jul 15.