Suppr超能文献

转录组学和表观遗传学数据集成学习模块在谷歌云上。

Transcriptomics and epigenetic data integration learning module on Google Cloud.

机构信息

Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States.

Department of Chemistry and Physics, Drury University, 900 N. Benton Avenue, Springfield, MO 65802, United States.

出版信息

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae352.

Abstract

Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

摘要

多组学(基因组学、转录组学、表观基因组学、蛋白质组学、代谢组学等)研究方法对于理解人类生物学的层次复杂性至关重要,并且已被证明在癌症研究和精准医学中非常有价值。近年来,新兴的科学进展使得高通量全基因组测序成为分子研究的核心关注点,允许在单个组织中甚至在单个细胞水平上对来自不同类型标本的各种分子生物学数据进行集体分析。此外,借助改进的计算资源和数据挖掘,研究人员能够整合来自不同多组学领域的数据,以识别新的预后、诊断或预测生物标志物,发现新的治疗靶点,并为患者开发更个性化的治疗方案。为了使研究界更有效地从每天生成的所有生物数据中提取具有科学和临床意义的信息,同时减少资源浪费,熟悉和熟练使用高级分析工具,如谷歌云平台,变得至关重要。本项目是一个跨学科、跨组织的努力,旨在为用户提供一个指导学习模块,将转录组学和表观遗传学数据分析协议整合到一个综合分析管道中,以便用户在自己的工作中实施,利用谷歌云上的云计算基础设施。学习模块由三个子模块组成,指导用户通过示例教程,说明如何分析 RNA 测序和简化重亚硫酸盐测序数据。这些示例采用乳腺癌病例研究的形式,数据集取自公共基因表达综合数据库。第一个子模块专门用于使用 RNA 测序数据进行转录组学分析,第二个子模块专注于使用 DNA 甲基化数据进行表观遗传学分析,第三个子模块则整合这两种方法以进行更深入的生物学理解。模块从数据收集和预处理开始,在 Vertex AI Jupyter 笔记本实例中使用 R 内核进行进一步的下游分析。分析结果返回到 Google Cloud 存储桶进行存储和可视化,从而减轻本地资源的计算负担。对于经验有限的多组学研究人员来说,这是一个从入门到精通的教程,可将转录组学和表观遗传学数据分析整合到一个综合管道中,以进行自己的生物学研究。本文描述了一个资源模块的开发,该模块是名为“NIGMS 基于云的学习沙盒”(https://github.com/NIGMS/NIGMS-Sandbox)的学习平台的一部分。沙盒的总体起源在本增刊开头的社论“NIGMS 沙盒”[16]中进行了描述。该模块以交互格式提供关于批量和单细胞 ATAC-seq 数据分析的学习材料,使用适当的云资源进行数据访问和分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验