Suppr超能文献

可扩展宏基因组比对研究工具(SMART):一种用于对复杂序列群体中的宏基因组序列进行分类的可扩展、快速且完整的搜索启发式方法。

Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.

作者信息

Lee Aaron Y, Lee Cecilia S, Van Gelder Russell N

机构信息

Department of Ophthalmology, University of Washington School of Medicine, Box 359608, 325 Ninth Avenue, Seattle, WA, 98104, USA.

Departments of Biological Structure and Pathology, University of Washington School of Medicine, Seattle, WA, USA.

出版信息

BMC Bioinformatics. 2016 Jul 28;17:292. doi: 10.1186/s12859-016-1159-6.

Abstract

BACKGROUND

Next generation sequencing technology has enabled characterization of metagenomics through massively parallel genomic DNA sequencing. The complexity and diversity of environmental samples such as the human gut microflora, combined with the sustained exponential growth in sequencing capacity, has led to the challenge of identifying microbial organisms by DNA sequence. We sought to validate a Scalable Metagenomics Alignment Research Tool (SMART), a novel searching heuristic for shotgun metagenomics sequencing results.

RESULTS

After retrieving all genomic DNA sequences from the NCBI GenBank, over 1 × 10(11) base pairs of 3.3 × 10(6) sequences from 9.25 × 10(5) species were indexed using 4 base pair hashtable shards. A MapReduce searching strategy was used to distribute the search workload in a computing cluster environment. In addition, a one base pair permutation algorithm was used to account for single nucleotide polymorphisms and sequencing errors. Simulated datasets used to evaluate Kraken, a similar metagenomics classification tool, were used to measure and compare precision and accuracy. Finally using a same set of training sequences we compared Kraken, CLARK, and SMART within the same computing environment. Utilizing 12 computational nodes, we completed the classification of all datasets in under 10 min each using exact matching with an average throughput of over 1.95 × 10(6) reads classified per minute. With permutation matching, we achieved sensitivity greater than 83 % and precision greater than 94 % with simulated datasets at the species classification level. We demonstrated the application of this technique applied to conjunctival and gut microbiome metagenomics sequencing results. In our head to head comparison, SMART and CLARK had similar accuracy gains over Kraken at the species classification level, but SMART required approximately half the amount of RAM of CLARK.

CONCLUSIONS

SMART is the first scalable, efficient, and rapid metagenomics classification algorithm capable of matching against all the species and sequences present in the NCBI GenBank and allows for a single step classification of microorganisms as well as large plant, mammalian, or invertebrate genomes from which the metagenomic sample may have been derived.

摘要

背景

新一代测序技术通过大规模平行基因组DNA测序实现了宏基因组学的表征。人类肠道微生物群等环境样本的复杂性和多样性,再加上测序能力的持续指数增长,给通过DNA序列鉴定微生物带来了挑战。我们试图验证一种可扩展的宏基因组比对研究工具(SMART),这是一种用于鸟枪法宏基因组测序结果的新型搜索启发式方法。

结果

从NCBI GenBank检索所有基因组DNA序列后,使用4碱基对哈希表碎片对来自9.25×10⁵个物种的3.3×10⁶个序列的超过1×10¹¹个碱基对进行了索引。采用MapReduce搜索策略在计算集群环境中分配搜索工作量。此外,使用单碱基对排列算法来处理单核苷酸多态性和测序错误。用于评估类似宏基因组分类工具Kraken的模拟数据集用于测量和比较精度与准确性。最后,在相同的计算环境中,我们使用同一组训练序列比较了Kraken、CLARK和SMART。利用12个计算节点,我们通过精确匹配在不到10分钟的时间内完成了所有数据集的分类,平均通量超过每分钟1.95×10⁶条分类读段。通过排列匹配,在物种分类水平上,我们对模拟数据集实现了大于83%的灵敏度和大于94%的精度。我们展示了该技术应用于结膜和肠道微生物组宏基因组测序结果的情况。在我们的直接比较中,在物种分类水平上,SMART和CLARK相对于Kraken具有相似的准确性提升,但SMART所需的随机存取存储器(RAM)量约为CLARK的一半。

结论

SMART是第一种可扩展、高效且快速的宏基因组分类算法,能够与NCBI GenBank中存在的所有物种和序列进行匹配,并允许对微生物以及宏基因组样本可能来源的大型植物、哺乳动物或无脊椎动物基因组进行单步分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb51/4963998/7af1af286b34/12859_2016_1159_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验