Department of Computer Science, University of Kentucky, Lexington, KY, USA.
Department of Computer Science,University of Kentucky, Lexington, KY, USA.
Bioinformatics. 2018 Jan 1;34(1):171-178. doi: 10.1093/bioinformatics/btx432.
MOTIVATION: Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Although many algorithms have been developed to date, they suffer significant memory and/or computational costs. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand. RESULTS: We introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequencing reads. The algorithm employs a novel data structure, called l-Othello, to support efficient querying of a taxon using its k-mer signatures. MetaOthello is an order-of-magnitude faster than the current state-of-the-art algorithms Kraken and Clark, and requires only one-third of the RAM. In comparison to Kaiju, a metagenomic classification tool using protein sequences instead of genomic sequences, MetaOthello is three times faster and exhibits 20-30% higher classification sensitivity. We report comparative analyses of both scalability and accuracy using a number of simulated and empirical datasets. AVAILABILITY AND IMPLEMENTATION: MetaOthello is a stand-alone program implemented in C ++. The current version (1.0) is accessible via https://doi.org/10.5281/zenodo.808941. CONTACT: liuj@cs.uky.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:宏基因组测序读分类是识别和量化高通量测序采样微生物物种的关键步骤。尽管迄今为止已经开发了许多算法,但它们存在显著的内存和/或计算成本。由于宏基因组数据在基础科学和临床应用中的日益普及,以及生成的数据量不断增加,高效和准确的算法需求量很大。
结果:我们引入了 MetaOthello,这是一种用于宏基因组测序读的概率哈希分类器。该算法采用了一种新的数据结构,称为 l-Othello,以支持使用其 k-mer 特征对分类单元进行高效查询。MetaOthello 比当前最先进的算法 Kraken 和 Clark 快一个数量级,仅需其三分之一的 RAM。与使用蛋白质序列而不是基因组序列的宏基因组分类工具 Kaiju 相比,MetaOthello 的速度快三倍,并且表现出 20-30%的更高分类灵敏度。我们报告了使用一些模拟和经验数据集进行的可扩展性和准确性的比较分析。
可用性和实现:MetaOthello 是一个用 C ⁇ 编写的独立程序。当前版本(1.0)可通过 https://doi.org/10.5281/zenodo.808941 访问。
联系人:liuj@cs.uky.edu。
补充信息:补充数据可在生物信息学在线获得。
Bioinformatics. 2017-12-1
Bioinformatics. 2016-9-1
Comput Struct Biotechnol J. 2023-10-19
BMC Bioinformatics. 2023-10-12
Nat Biotechnol. 2022-4
Front Genet. 2021-10-8
Front Cell Dev Biol. 2021-4-28
Front Microbiol. 2021-3-17
Genome Res. 2016-12
Bioinformatics. 2016-12-15
Nat Commun. 2016-4-13
Sci Rep. 2016-1-18
Nucleic Acids Res. 2015-5-26
Genome Biol. 2014-3-3
Nucleic Acids Res. 2014-2-12
Nat Methods. 2013-10-20