Suppr超能文献

一种针对全食品测序的宏基因组学大数据方法。

A big data approach to metagenomics for all-food-sequencing.

机构信息

Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany.

IPCA, Polytechnic Institute of Cávado and Ave, Barcelos, 4750-810, Portugal.

出版信息

BMC Bioinformatics. 2020 Mar 12;21(1):102. doi: 10.1186/s12859-020-3429-6.

Abstract

BACKGROUND

All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches.

RESULTS

We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark).

CONCLUSIONS

We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters).

摘要

背景

全食物测序(AFS)是一种非靶向宏基因组测序方法,可用于检测和定量食物成分,包括动物、植物和微生物群。虽然这种方法避免了一些基于靶向 PCR 的方法的缺点,但它需要将序列读取与大量参考基因组进行比较。可用参考基因组的数量不断增加,这就需要有效的大数据方法。

结果

我们引入了一种基于无比对 k-mer 的方法,用于检测和定量食物和其他复杂生物物质中的物种组成。它比我们之前基于比对的 AFS 管道快几个数量级。与已建立的工具 CLARK、Kraken2 和 Kraken2+Bracken 相比,它在假阳性率和定量准确性方面具有优势。此外,使用有效的数据库分区方案允许在工作站(AFS-MetaCache)或基于 Spark 的计算集群(MetaCacheSpark)上处理大量参考基因组,减少对内存的要求。

结论

我们提出了一种快速而准确的全基因组 shotgun 测序生物监测应用筛选方法,例如食品检测。通过依赖大数据方法,它可以有效地扩展到大规模的复杂真核生物和细菌参考基因组集合。AFS-MetaCache 和 MetaCacheSpark 是广泛的宏基因组筛选应用的合适工具。它们可在 https://muellan.github.io/metacache/afs.html(工作站的 C++ 版本)和 https://github.com/jmabuin/MetaCacheSpark(大数据集群的 Spark 版本)上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ced/7069206/45ddc9f14cef/12859_2020_3429_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验