Suppr超能文献

GEMmaker:在异构计算基础设施上处理大规模 RNA-seq 数据集。

GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure.

机构信息

Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA.

Department of Horticulture, Washington State University, Pullman, WA, USA.

出版信息

BMC Bioinformatics. 2022 May 2;23(1):156. doi: 10.1186/s12859-022-04629-7.

Abstract

BACKGROUND

Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures.

RESULTS

GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage.

CONCLUSIONS

Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.

摘要

背景

从 RNA-seq 数据中定量基因表达是转录组分析(如差异基因表达分析和基因共表达网络构建)的前提。单个 RNA-seq 实验规模较大,将来自序列存储库的多个实验组合在一起,可能会导致数千个样本的数据集。处理数百到数千个 RNA-seq 数据可能会导致与数据管理、访问足够的计算资源、导航高性能计算 (HPC) 系统、安装所需的软件依赖项以及可重复性相关的挑战。随着测序技术的成熟,处理更大和更深层次的 RNA-seq 实验将变得更加普遍。

结果

GEMmaker 是一个符合 nf-core 标准的 Nextflow 工作流程,可从小型到大规模的 RNA-seq 数据集定量基因表达。GEMmaker 通过使用可在单个工作站、机构计算集群、Kubernetes 平台或云执行的版本化容器化软件,确保结果具有高度可重复性。GEMmaker 支持流行的对齐和定量工具,提供原始和标准化格式的结果。GEMmaker 的独特之处在于,它可以扩展到处理数千个本地或远程存储的样本,而不会超过可用的数据存储。

结论

定量基因表达的工作流程并不新鲜,许多工作流程已经解决了可移植性、可重用性和 CPU 访问方面的问题。GEMmaker 提供了这些好处,并增加了尽管数据存储基础设施较低但仍能扩展的能力。这允许用户处理数百到数千个 RNA-seq 样本,即使数据存储资源有限。GEMmaker 是免费提供的,并且具有完整的文档记录,包括逐步的设置和执行说明。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8554/9063052/2bc67d7a68d0/12859_2022_4629_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验