Suppr超能文献

快速聚类和功能注释的超大宏基因组分析与比较。

Analysis and comparison of very large metagenomes with fast clustering and functional annotation.

机构信息

California Institute for Telecommunications and Information Technology, University of California, San Diego, La Jolla, California 92093, USA.

出版信息

BMC Bioinformatics. 2009 Oct 28;10:359. doi: 10.1186/1471-2105-10-359.

Abstract

BACKGROUND

The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.

RESULTS

The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".

CONCLUSION

RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

摘要

背景

宏基因组学的显著进步在数据分析方面带来了重大的新挑战。宏基因组数据集(宏基因组)是特定环境中匿名物种的测序reads 的大型集合。对非常大的宏基因组进行计算分析极其耗时,并且这些宏基因组中通常有许多未充分利用的新序列。可用的宏基因组数量正在迅速增加,因此需要快速有效的宏基因组比较方法。

结果

新的宏基因组数据分析方法 Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) 使用超快速序列聚类算法、快速蛋白质家族注释工具以及一种新颖的统计宏基因组比较方法(该方法采用独特的图形界面)开发。RAMMCAP 仅使用适度的计算工作量即可处理非常大的数据集。它可以识别可能包含新基因家族的原始读取聚类和蛋白质聚类,并使用 RAMMCAP 计算的聚类或功能注释来比较宏基因组。在这项研究中,RAMMCAP 应用于两个最大的可用宏基因组数据集,即“全球海洋采样”和“九大生物群落的宏基因组分析”。

结论

RAMMCAP 是一种非常快速的方法,仅需数百个 CPU 小时即可对一百万条宏基因组reads 进行聚类和注释。它可从 http://tools.camera.calit2.net/camera/rammcap/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/880a/2774329/562c37c2cff8/1471-2105-10-359-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验