Suppr超能文献

CoMeta:使用k-mer对宏基因组进行分类

CoMeta: classification of metagenomes using k-mers.

作者信息

Kawulok Jolanta, Deorowicz Sebastian

机构信息

Institute of Informatics, Silesian University of Technology, Gliwice, Poland.

出版信息

PLoS One. 2015 Apr 17;10(4):e0121453. doi: 10.1371/journal.pone.0121453. eCollection 2015.

Abstract

Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license.

摘要

如今,环境样本的研究发展迅速。环境成分的表征拓宽了关于物种组成与环境条件之间关系的认知。提取样本组成知识的一个重要因素是将提取的DNA片段与来自已知生物的序列进行比较。在本文中,我们介绍了一种名为CoMeta(宏基因组分类)的算法,它将查询读段(一个DNA片段)分配到用户先前准备的组之一中。通常,这是分类等级之一(例如,门、属),不过准备的组可能包含具有各种功能的序列。在CoMeta中,我们使用了基于短子序列(k-mer)的精确读段分类方法以及用于索引大量k-mer的快速程序。与最流行的基于BLAST的方法不同,在BLAST中查询会与每个参考序列进行比较,我们从分类树的顶部开始分类以减少比较次数。所呈现的实验研究证实CoMeta优于在此背景下使用的其他程序。CoMeta可在https://github.com/jkawulok/cometa上以免费的GNU GPL 2许可获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/4401624/a64bd24a7955/pone.0121453.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验