Suppr超能文献

INDUS-一种基于组合的方法,用于快速准确地对宏基因组序列进行分类。

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.

机构信息

Bio-sciences R&D Division, TCS Innovation Labs, Tata Consultancy Services Limited, 1 Software Units Layout, Madhapur, Hyderabad - 500081, Andhra Pradesh, India.

出版信息

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-12-S3-S4.

Abstract

BACKGROUND

Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present INDUS - a composition-based approach that incorporates the following novel features. First, INDUS discards the 'one genome-one composition' model adopted by existing compositional approaches. Second, INDUS uses 'compositional distance' information for identifying appropriate assignment levels. Third, INDUS incorporates steps that attempt to reduce biases due to database representation.

RESULTS

INDUS is able to rapidly classify sequences in both simulated and real metagenomic sequence data sets with classification efficiency significantly higher than existing composition-based approaches. Although the classification efficiency of INDUS is observed to be comparable to those by similarity-based approaches, the binning time (as compared to alignment based approaches) is 23-33 times lower.

CONCLUSION

Given it's rapid execution time, and high levels of classification efficiency, INDUS is expected to be of immense interest to researchers working in metagenomics and microbial ecology.

AVAILABILITY

A web-server for the INDUS algorithm is available at http://metagenomics.atc.tcs.com/INDUS/

摘要

背景

宏基因组序列的分类学分类是宏基因组分析的第一步。现有的分类学分类方法有两种,基于相似性的和基于组成的。基于相似性的方法虽然准确和具体,但速度非常慢。由于宏基因组项目产生了数百万条序列,对于计算资源有限的研究小组来说,采用基于相似性的方法几乎是不可行的。在本研究中,我们提出了 INDUS——一种基于组成的方法,它包含以下新特性。首先,INDUS 摒弃了现有组成方法所采用的“一个基因组一个组成”模型。其次,INDUS 使用“组成距离”信息来识别合适的分配水平。第三,INDUS 包含了试图减少由于数据库表示而产生偏差的步骤。

结果

INDUS 能够快速对模拟和真实宏基因组序列数据集进行分类,分类效率明显高于现有的基于组成的方法。虽然 INDUS 的分类效率与基于相似性的方法相当,但它的 binning 时间(与基于比对的方法相比)要低 23-33 倍。

结论

鉴于其快速的执行时间和高分类效率,INDUS 预计将引起从事宏基因组学和微生物生态学研究的研究人员的极大兴趣。

可用性

INDUS 算法的网络服务器可在 http://metagenomics.atc.tcs.com/INDUS/ 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验