McDonald Daniel, Kaehler Benjamin, Gonzalez Antonio, DeReus Jeff, Ackermann Gail, Marotz Clarisse, Huttley Gavin, Knight Rob
Department of Pediatrics, University of California San Diego, La Jolla, California, USA.
School of Science, University of New South Wales, Canberra, Australia.
mSystems. 2019 Jun 25;4(4):e00215-19. doi: 10.1128/mSystems.00215-19.
Meta-analyses at the whole-community level have been important in microbiome studies, revealing profound features that structure Earth's microbial communities, such as the unique differentiation of microbes from the mammalian gut relative to free-living microbial communities, the separation of microbiomes in saline and nonsaline environments, and the role of pH in driving soil microbial compositions. However, our ability to identify the specific features of a microbiome that differentiate these community-level patterns have lagged behind, especially as ever-cheaper DNA sequencing has yielded increasingly large data sets. One critical gap is the ability to search for samples that contain specific features (for example, sub-operational taxonomic units [sOTUs] identified by high-resolution statistical methods for removing amplicon sequencing errors). Here we introduce redbiom, a microbiome caching layer, which allows users to rapidly query samples that contain a given feature, retrieve sample data and metadata, and search for samples that match specified metadata values or ranges (e.g., all samples with a pH of >7), implemented using an in-memory NoSQL database called Redis. By default, redbiom allows public anonymous sample access for over 100,000 publicly available samples in the Qiita database. At over 100,000 samples, the caching server requires only 35 GB of resident memory. We highlight how redbiom enables a new type of characterization of microbiome samples and provide tutorials for using redbiom with QIIME 2. redbiom is open source under the BSD license, hosted on GitHub, and can be deployed independently of Qiita to enable search of proprietary or clinically restricted microbiome databases. Although analyses that combine many microbiomes at the whole-community level have become routine, searching rapidly for microbiomes that contain a particular sequence has remained difficult. The software we present here, redbiom, dramatically accelerates this process, allowing samples that contain microbiome features to be rapidly identified. This is especially useful when taxonomic annotation is limited, allowing users to identify environments in which unannotated microbes of interest were previously observed. This approach also allows environmental or clinical factors that correlate with specific features, or vice versa, to be identified rapidly, even at a scale of billions of sequences in hundreds of thousands of samples. The software is integrated with existing analysis tools to enable fast, large-scale microbiome searches and discovery of new microbiome relationships.
在微生物组研究中,全群落水平的荟萃分析十分重要,它揭示了构成地球微生物群落的深刻特征,比如与自由生活的微生物群落相比,哺乳动物肠道微生物的独特分化、盐环境和非盐环境中微生物组的分离,以及pH值在驱动土壤微生物组成方面的作用。然而,我们识别微生物组特定特征以区分这些群落水平模式的能力却滞后了,尤其是随着DNA测序成本不断降低,产生的数据量越来越大。一个关键差距在于搜索包含特定特征的样本的能力(例如,通过用于去除扩增子测序错误的高分辨率统计方法识别的亚操作分类单元[sOTUs])。在此,我们介绍redbiom,一种微生物组缓存层,它允许用户快速查询包含给定特征的样本,检索样本数据和元数据,并搜索匹配指定元数据值或范围的样本(例如,所有pH值大于7的样本),该功能通过一个名为Redis的内存型非关系型数据库来实现。默认情况下,redbiom允许对Qiita数据库中超过10万个公开可用样本进行公共匿名样本访问。对于超过10万个样本,缓存服务器仅需35GB的常驻内存。我们重点介绍了redbiom如何实现对微生物组样本的一种新型表征,并提供了使用redbiom与QIIME 2的教程。redbiom在BSD许可下开源,托管在GitHub上,并且可以独立于Qiita进行部署,以实现对专有或临床受限微生物组数据库的搜索。尽管在全群落水平上对多个微生物组进行综合分析已成为常规操作,但快速搜索包含特定序列的微生物组仍然困难。我们在此展示的软件redbiom极大地加速了这一过程,能够快速识别包含微生物组特征的样本。当分类注释有限时,这尤其有用,它能让用户识别出先前观察到感兴趣的未注释微生物的环境。这种方法还能快速识别与特定特征相关或反之亦然的环境或临床因素,即使是在数十万个样本中包含数十亿个序列的规模下。该软件与现有分析工具集成,以实现快速、大规模的微生物组搜索并发现新的微生物组关系。