文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

一种新的数据结构,用于支持基于 k-mer 特征的宏基因组序列的超快速分类学分类。

A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures.

机构信息

Department of Computer Science, University of Kentucky, Lexington, KY, USA.

Department of Computer Science,University of Kentucky, Lexington, KY, USA.

出版信息

Bioinformatics. 2018 Jan 1;34(1):171-178. doi: 10.1093/bioinformatics/btx432.


DOI:10.1093/bioinformatics/btx432
PMID:29036588
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5870563/
Abstract

MOTIVATION: Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Although many algorithms have been developed to date, they suffer significant memory and/or computational costs. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand. RESULTS: We introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequencing reads. The algorithm employs a novel data structure, called l-Othello, to support efficient querying of a taxon using its k-mer signatures. MetaOthello is an order-of-magnitude faster than the current state-of-the-art algorithms Kraken and Clark, and requires only one-third of the RAM. In comparison to Kaiju, a metagenomic classification tool using protein sequences instead of genomic sequences, MetaOthello is three times faster and exhibits 20-30% higher classification sensitivity. We report comparative analyses of both scalability and accuracy using a number of simulated and empirical datasets. AVAILABILITY AND IMPLEMENTATION: MetaOthello is a stand-alone program implemented in C ++. The current version (1.0) is accessible via https://doi.org/10.5281/zenodo.808941. CONTACT: liuj@cs.uky.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

摘要

动机:宏基因组测序读分类是识别和量化高通量测序采样微生物物种的关键步骤。尽管迄今为止已经开发了许多算法,但它们存在显著的内存和/或计算成本。由于宏基因组数据在基础科学和临床应用中的日益普及,以及生成的数据量不断增加,高效和准确的算法需求量很大。

结果:我们引入了 MetaOthello,这是一种用于宏基因组测序读的概率哈希分类器。该算法采用了一种新的数据结构,称为 l-Othello,以支持使用其 k-mer 特征对分类单元进行高效查询。MetaOthello 比当前最先进的算法 Kraken 和 Clark 快一个数量级,仅需其三分之一的 RAM。与使用蛋白质序列而不是基因组序列的宏基因组分类工具 Kaiju 相比,MetaOthello 的速度快三倍,并且表现出 20-30%的更高分类灵敏度。我们报告了使用一些模拟和经验数据集进行的可扩展性和准确性的比较分析。

可用性和实现:MetaOthello 是一个用 C ⁇ 编写的独立程序。当前版本(1.0)可通过 https://doi.org/10.5281/zenodo.808941 访问。

联系人:liuj@cs.uky.edu。

补充信息:补充数据可在生物信息学在线获得。

相似文献

[1]
A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures.

Bioinformatics. 2018-1-1

[2]
MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data.

Bioinformatics. 2017-6-1

[3]
MetaCache: context-aware classification of metagenomic reads using minhashing.

Bioinformatics. 2017-12-1

[4]
Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters.

Genome Res. 2024-7-23

[5]
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

Bioinformatics. 2017-3-15

[6]
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

BMC Genomics. 2019-4-4

[7]
Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.

BMC Bioinformatics. 2016-7-28

[8]
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

BMC Bioinformatics. 2015-3-3

[9]
AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides.

BMC Bioinformatics. 2024-7-16

[10]
MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.

Bioinformatics. 2016-9-1

引用本文的文献

[1]
Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.

Genome Biol Evol. 2024-5-2

[2]
Exercise and microbiome: From big data to therapy.

Comput Struct Biotechnol J. 2023-10-19

[3]
cgMSI: pathogen detection within species from nanopore metagenomic sequencing data.

BMC Bioinformatics. 2023-10-12

[4]
Nanopore sequencing of a monkeypox virus strain isolated from a pustular lesion in the Central African Republic.

Sci Rep. 2022-6-24

[5]
SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning.

Genome Biol. 2022-6-20

[6]
Fast and accurate metagenotyping of the human gut microbiome with GT-Pro.

Nat Biotechnol. 2022-4

[7]
Application of Deep Learning in Plant-Microbiota Association Analysis.

Front Genet. 2021-10-8

[8]
Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics.

Biology (Basel). 2021-10-11

[9]
Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA.

Front Cell Dev Biol. 2021-4-28

[10]
Specific Microbial Taxa and Functional Capacity Contribute to Chicken Abdominal Fat Deposition.

Front Microbiol. 2021-3-17

本文引用的文献

[1]
Centrifuge: rapid and sensitive classification of metagenomic sequences.

Genome Res. 2016-12

[2]
Higher classification sensitivity of short metagenomic reads with CLARK-S.

Bioinformatics. 2016-12-15

[3]
Fast and sensitive taxonomic classification for metagenomics with Kaiju.

Nat Commun. 2016-4-13

[4]
An evaluation of the accuracy and speed of metagenome analysis tools.

Sci Rep. 2016-1-18

[5]
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.

BMC Genomics. 2015-3-25

[6]
Accurate read-based metagenome characterization using a hierarchical suite of unique signatures.

Nucleic Acids Res. 2015-5-26

[7]
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.

Bioinformatics. 2015-3-15

[8]
Kraken: ultrafast metagenomic sequence classification using exact alignments.

Genome Biol. 2014-3-3

[9]
Strain/species identification in metagenomes using genome-specific markers.

Nucleic Acids Res. 2014-2-12

[10]
Metagenomic species profiling using universal phylogenetic marker genes.

Nat Methods. 2013-10-20

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索