Suppr超能文献

一种新的数据结构,用于支持基于 k-mer 特征的宏基因组序列的超快速分类学分类。

A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures.

机构信息

Department of Computer Science, University of Kentucky, Lexington, KY, USA.

Department of Computer Science,University of Kentucky, Lexington, KY, USA.

出版信息

Bioinformatics. 2018 Jan 1;34(1):171-178. doi: 10.1093/bioinformatics/btx432.

Abstract

MOTIVATION

Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Although many algorithms have been developed to date, they suffer significant memory and/or computational costs. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand.

RESULTS

We introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequencing reads. The algorithm employs a novel data structure, called l-Othello, to support efficient querying of a taxon using its k-mer signatures. MetaOthello is an order-of-magnitude faster than the current state-of-the-art algorithms Kraken and Clark, and requires only one-third of the RAM. In comparison to Kaiju, a metagenomic classification tool using protein sequences instead of genomic sequences, MetaOthello is three times faster and exhibits 20-30% higher classification sensitivity. We report comparative analyses of both scalability and accuracy using a number of simulated and empirical datasets.

AVAILABILITY AND IMPLEMENTATION

MetaOthello is a stand-alone program implemented in C ++. The current version (1.0) is accessible via https://doi.org/10.5281/zenodo.808941.

CONTACT

liuj@cs.uky.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组测序读分类是识别和量化高通量测序采样微生物物种的关键步骤。尽管迄今为止已经开发了许多算法,但它们存在显著的内存和/或计算成本。由于宏基因组数据在基础科学和临床应用中的日益普及,以及生成的数据量不断增加,高效和准确的算法需求量很大。

结果

我们引入了 MetaOthello,这是一种用于宏基因组测序读的概率哈希分类器。该算法采用了一种新的数据结构,称为 l-Othello,以支持使用其 k-mer 特征对分类单元进行高效查询。MetaOthello 比当前最先进的算法 Kraken 和 Clark 快一个数量级,仅需其三分之一的 RAM。与使用蛋白质序列而不是基因组序列的宏基因组分类工具 Kaiju 相比,MetaOthello 的速度快三倍,并且表现出 20-30%的更高分类灵敏度。我们报告了使用一些模拟和经验数据集进行的可扩展性和准确性的比较分析。

可用性和实现

MetaOthello 是一个用 C ⁇ 编写的独立程序。当前版本(1.0)可通过 https://doi.org/10.5281/zenodo.808941 访问。

联系人

liuj@cs.uky.edu

补充信息

补充数据可在生物信息学在线获得。

相似文献

引用本文的文献

2
Exercise and microbiome: From big data to therapy.运动与微生物组:从大数据到治疗。
Comput Struct Biotechnol J. 2023 Oct 19;21:5434-5445. doi: 10.1016/j.csbj.2023.10.034. eCollection 2023.
9
Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA.使用deSAMBA对宏基因组长读段进行快速准确分类
Front Cell Dev Biol. 2021 Apr 28;9:643645. doi: 10.3389/fcell.2021.643645. eCollection 2021.

本文引用的文献

1
Centrifuge: rapid and sensitive classification of metagenomic sequences.离心机:宏基因组序列的快速灵敏分类
Genome Res. 2016 Dec;26(12):1721-1729. doi: 10.1101/gr.210641.116. Epub 2016 Oct 17.
2
Higher classification sensitivity of short metagenomic reads with CLARK-S.使用CLARK-S时短宏基因组读数具有更高的分类敏感性。
Bioinformatics. 2016 Dec 15;32(24):3823-3825. doi: 10.1093/bioinformatics/btw542. Epub 2016 Aug 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验