基于云的微生物群落宏基因组分析的生物膜标志物发现。

Biofilm marker discovery with cloud-based dockerized metagenomics analysis of microbial communities.

机构信息

Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States.

Google Cloud, 1900 Reston Metro Plaza, Reston, Virginia, 20190, United States.

出版信息

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae429.

DOI:10.1093/bib/bbae429

PMID:39266450

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11392556/

Abstract

In an environment, microbes often work in communities to achieve most of their essential functions, including the production of essential nutrients. Microbial biofilms are communities of microbes that attach to a nonliving or living surface by embedding themselves into a self-secreted matrix of extracellular polymeric substances. These communities work together to enhance their colonization of surfaces, produce essential nutrients, and achieve their essential functions for growth and survival. They often consist of diverse microbes including bacteria, viruses, and fungi. Biofilms play a critical role in influencing plant phenotypes and human microbial infections. Understanding how these biofilms impact plant health, human health, and the environment is important for analyzing genotype-phenotype-driven rule-of-life functions. Such fundamental knowledge can be used to precisely control the growth of biofilms on a given surface. Metagenomics is a powerful tool for analyzing biofilm genomes through function-based gene and protein sequence identification (functional metagenomics) and sequence-based function identification (sequence metagenomics). Metagenomic sequencing enables a comprehensive sampling of all genes in all organisms present within a biofilm sample. However, the complexity of biofilm metagenomic study warrants the increasing need to follow the Findability, Accessibility, Interoperability, and Reusable (FAIR) Guiding Principles for scientific data management. This will ensure that scientific findings can be more easily validated by the research community. This study proposes a dockerized, self-learning bioinformatics workflow to increase the community adoption of metagenomics toolkits in a metagenomics and meta-transcriptomics investigation. Our biofilm metagenomics workflow self-learning module includes integrated learning resources with an interactive dockerized workflow. This module will allow learners to analyze resources that are beneficial for aggregating knowledge about biofilm marker genes, proteins, and metabolic pathways as they define the composition of specific microbial communities. Cloud and dockerized technology can allow novice learners-even those with minimal knowledge in computer science-to use complicated bioinformatics tools. Our cloud-based, dockerized workflow splits biofilm microbiome metagenomics analyses into four easy-to-follow submodules. A variety of tools are built into each submodule. As students navigate these submodules, they learn about each tool used to accomplish the task. The downstream analysis is conducted using processed data obtained from online resources or raw data processed via Nextflow pipelines. This analysis takes place within Vertex AI's Jupyter notebook instance with R and Python kernels. Subsequently, results are stored and visualized in Google Cloud storage buckets, alleviating the computational burden on local resources. The result is a comprehensive tutorial that guides bioinformaticians of any skill level through the entire workflow. It enables them to comprehend and implement the necessary processes involved in this integrated workflow from start to finish. This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

摘要

在环境中，微生物通常在群落中工作，以实现其大部分基本功能，包括生产必需的营养物质。微生物生物膜是微生物群落，通过将自身嵌入到细胞外聚合物基质中来附着在无生命或有生命的表面上。这些群落共同努力，增强对表面的定殖，产生必需的营养物质，并实现其生长和生存的基本功能。它们通常由包括细菌、病毒和真菌在内的多种微生物组成。生物膜在影响植物表型和人类微生物感染方面起着关键作用。了解这些生物膜如何影响植物健康、人类健康和环境，对于分析由基因型-表型驱动的生活规则功能非常重要。这种基础知识可用于精确控制给定表面上生物膜的生长。宏基因组学是一种通过基于功能的基因和蛋白质序列鉴定（功能宏基因组学）和基于序列的功能鉴定（序列宏基因组学）来分析生物膜基因组的强大工具。宏基因组测序能够全面采样生物膜样本中所有存在的生物的所有基因。然而，生物膜宏基因组研究的复杂性需要越来越多地遵循可发现性、可访问性、互操作性和可重用性（FAIR）科学数据管理指导原则。这将确保研究结果更容易被研究界验证。本研究提出了一个基于 Docker 的自学习生物信息学工作流程，以增加社区对生物膜宏基因组工具包在宏基因组学和元转录组学研究中的采用。我们的生物膜宏基因组学工作流程自学习模块包括具有交互 Docker 工作流程的集成学习资源。该模块将允许学习者分析有助于聚集有关生物膜标记基因、蛋白质和代谢途径的知识的资源，因为它们定义了特定微生物群落的组成。云技术和 Docker 可以允许初学者（即使是计算机科学知识有限的初学者）使用复杂的生物信息学工具。我们基于云的、基于 Docker 的工作流程将生物膜微生物组宏基因组分析分为四个易于遵循的子模块。每个子模块都内置了各种工具。当学生浏览这些子模块时，他们将了解用于完成任务的每个工具。下游分析使用从在线资源获得的已处理数据或通过 Nextflow 管道处理的原始数据进行。此分析在 Vertex AI 的 Jupyter 笔记本实例中使用 R 和 Python 内核进行。随后，结果存储并在 Google Cloud 存储桶中可视化，减轻了本地资源的计算负担。结果是一个全面的教程，指导任何技能水平的生物信息学家完成整个工作流程。它使他们能够理解并实施从开始到结束的整个集成工作流程中涉及的必要过程。本文描述了名为“NIGMS 基于云的学习沙盒”的学习平台的一部分资源模块的开发，该平台的网址为 https://github.com/NIGMS/NIGMS-Sandbox。该沙盒的总体起源在本增刊开头的 NIGMS 沙盒社论 [1] 中进行了描述。该模块以交互格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料，该格式使用适当的云资源进行数据访问和分析。