SPEAQeasy：一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.

机构信息

Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.

Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico.

出版信息

BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.

DOI:10.1186/s12859-021-04142-3

PMID:33932985

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088074/

Abstract

BACKGROUND

RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step-such as alignment of reads to a reference genome-of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses.

RESULTS

In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided ( http://research.libd.org/SPEAQeasy/ ).

CONCLUSIONS

SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.

摘要

背景

RNA 测序（RNA-seq）是一种常见且广泛应用的生物学检测方法，其产生的数据量也在不断增加。在实践中，在原始 RNA-seq 读取产生直接有价值的信息（如差异基因表达数据）之前，研究人员必须执行大量的单个步骤。现有的软件工具通常是专门的，仅执行较大工作流程中的一个步骤，例如读取与参考基因组的比对。对更全面和可重复的工作流程的需求导致了许多公共可用的 RNA-seq 管道的产生。然而，我们发现大多数都需要计算专业知识来设置或在多个用户之间共享，没有得到积极维护，或者缺少我们在自己的分析中发现的重要功能。

结果

针对这些问题，我们开发了一种用于表达分析和定量的可扩展管道（Scalable Pipeline for Expression Analysis and Quantification，SPEAQeasy），它易于安装和共享，并为 R/Bioconductor 下游分析解决方案提供了一个桥梁。SPEAQeasy 可在计算框架（SGE、SLURM、本地、docker 集成）之间移植，并且提供了不同的配置文件（http://research.libd.org/SPEAQeasy/）。

结论

SPEAQeasy 易于使用，降低了生物学家和临床医生对 RNA-seq 数据处理的计算领域进入门槛，因为主要输入文件是一个包含样本名称及其对应的 FASTQ 文件的表格。目标是提供一个灵活的管道，无论研究人员的技术背景或计算环境如何，都可以立即使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f74/8088074/b58ea32a2d9c/12859_2021_4142_Fig1_HTML.jpg

相似文献

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.SPEAQeasy：一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。

BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.

ARMOR: An utomated eproducible dular Workflow for Preprocessing and Differential Analysis of NA-seq Data.ARMOR：一种用于预处理和差异分析 NA-seq 数据的自动化可重复模块化工作流程。

G3 (Bethesda). 2019 Jul 9;9(7):2089-2096. doi: 10.1534/g3.119.400185. Print 2019 Jul 1.

systemPipeR: NGS workflow and report generation environment.systemPipeR：二代测序工作流程与报告生成环境。

BMC Bioinformatics. 2016 Sep 20;17:388. doi: 10.1186/s12859-016-1241-0.

BiocMAP: a Bioconductor-friendly, GPU-accelerated pipeline for bisulfite-sequencing data.BiocMAP：一个适用于 Bioconductor 的、基于 GPU 加速的 bisulfite-sequencing 数据处理管道。

BMC Bioinformatics. 2023 Sep 13;24(1):340. doi: 10.1186/s12859-023-05461-3.

VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.VIPER：RNA-seq 可视化管道，一个 Snakemake 工作流程，用于高效完整的 RNA-seq 分析。

BMC Bioinformatics. 2018 Apr 12;19(1):135. doi: 10.1186/s12859-018-2139-9.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

A Computational Workflow for Analysis of 3' Tag-Seq Data.一种用于分析 3' 标签测序数据的计算工作流程。

Curr Protoc. 2023 Feb;3(2):e664. doi: 10.1002/cpz1.664.

RNASeqR: An R Package for Automated Two-Group RNA-Seq Analysis Workflow.RNASeqR：一个用于自动化两群组 RNA-Seq 分析工作流程的 R 包。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2023-2031. doi: 10.1109/TCBB.2019.2956708. Epub 2021 Oct 7.

Grape RNA-Seq analysis pipeline environment.葡萄 RNA-Seq 分析管道环境。

Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17.

引用本文的文献

RnaXtract, a tool for extracting gene expression, variants, and cell-type composition from bulk RNA sequencing.RnaXtract，一种用于从大量RNA测序中提取基因表达、变异体和细胞类型组成的工具。

Sci Rep. 2025 Aug 24;15(1):31100. doi: 10.1038/s41598-025-16875-9.

Benchmark of cellular deconvolution methods using a multi-assay dataset from postmortem human prefrontal cortex.使用来自人类前额叶皮质尸检多检测数据集的细胞反卷积方法基准测试。

Genome Biol. 2025 Apr 7;26(1):88. doi: 10.1186/s13059-025-03552-3.

3t-seq: automatic gene expression analysis of single-copy genes, transposable elements, and tRNAs from RNA-seq data.3t-seq：从 RNA-seq 数据中自动分析单拷贝基因、转座元件和 tRNA 的基因表达。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae467.

Systems biology dissection of PTSD and MDD across brain regions, cell types, and blood.系统生物学剖析 PTSD 和 MDD 在大脑区域、细胞类型和血液中的表现。

Science. 2024 May 24;384(6698):eadh3707. doi: 10.1126/science.adh3707.

Sex affects transcriptional associations with schizophrenia across the dorsolateral prefrontal cortex, hippocampus, and caudate nucleus.性别会影响与精神分裂症相关的转录本在背外侧前额叶皮质、海马体和尾状核中的关联。

Nat Commun. 2024 May 10;15(1):3980. doi: 10.1038/s41467-024-48048-z.

Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex.使用来自死后人类前额叶皮层的多检测参考数据集对细胞反卷积方法进行基准测试。

bioRxiv. 2024 Apr 7:2024.02.09.579665. doi: 10.1101/2024.02.09.579665.

Comparison of gene expression in living and postmortem human brain.活体与死后人类大脑中基因表达的比较。

medRxiv. 2023 Nov 9:2023.11.08.23298172. doi: 10.1101/2023.11.08.23298172.

BiocMAP: a Bioconductor-friendly, GPU-accelerated pipeline for bisulfite-sequencing data.BiocMAP：一个适用于 Bioconductor 的、基于 GPU 加速的 bisulfite-sequencing 数据处理管道。

BMC Bioinformatics. 2023 Sep 13;24(1):340. doi: 10.1186/s12859-023-05461-3.

Prioritization of potential causative genes for schizophrenia in placenta.优先考虑胎盘精神分裂症的潜在致病基因。

Nat Commun. 2023 May 15;14(1):2613. doi: 10.1038/s41467-023-38140-1.

The miR-124-AMPAR pathway connects polygenic risks with behavioral changes shared between schizophrenia and bipolar disorder.miR-124-AMPAR 通路将精神分裂症和双相情感障碍共有的多基因风险与行为改变联系起来。

Neuron. 2023 Jan 18;111(2):220-235.e9. doi: 10.1016/j.neuron.2022.10.031. Epub 2022 Nov 14.

本文引用的文献

Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer.癌症中剪接相关变异的发现：基因组和转录组数据的综合分析。

Nat Commun. 2023 Mar 22;14(1):1589. doi: 10.1038/s41467-023-37266-6.

Identification of SARS-CoV-2 inhibitors using lung and colonic organoids.使用肺和结肠类器官鉴定 SARS-CoV-2 抑制剂。

Nature. 2021 Jan;589(7841):270-275. doi: 10.1038/s41586-020-2901-9. Epub 2020 Oct 28.

Incomplete annotation has a disproportionate impact on our understanding of Mendelian and complex neurogenetic disorders.不完整的注释对我们理解孟德尔和复杂神经遗传疾病有不成比例的影响。

Sci Adv. 2020 Jun 10;6(24). doi: 10.1126/sciadv.aay8299. Print 2020 Jun.

Transcriptomic analysis identifies Toll-like and Nod-like pathways and necroptosis in pulmonary arterial hypertension.转录组分析鉴定出肺动脉高压中的 Toll 样和 Nod 样途径及坏死性凋亡。

J Cell Mol Med. 2020 Oct;24(19):11409-11421. doi: 10.1111/jcmm.15745. Epub 2020 Aug 29.

Profiling gene expression in the human dentate gyrus granule cell layer reveals insights into schizophrenia and its genetic risk.在人类齿状回颗粒细胞层中进行基因表达谱分析，揭示了精神分裂症及其遗传风险的见解。

Nat Neurosci. 2020 Apr;23(4):510-519. doi: 10.1038/s41593-020-0604-z. Epub 2020 Mar 16.

Dissecting transcriptomic signatures of neuronal differentiation and maturation using iPSCs.利用 iPS 细胞解析神经分化和成熟的转录组特征。

Nat Commun. 2020 Jan 23;11(1):462. doi: 10.1038/s41467-019-14266-z.

Divergent neuronal DNA methylation patterns across human cortical development reveal critical periods and a unique role of CpH methylation.人类大脑皮层发育过程中神经元 DNA 甲基化模式的差异揭示了关键时期和 CpH 甲基化的独特作用。

Genome Biol. 2019 Sep 26;20(1):196. doi: 10.1186/s13059-019-1805-1.

Loss of SMPD4 Causes a Developmental Disorder Characterized by Microcephaly and Congenital Arthrogryposis.SMPD4 缺失导致以小头畸形和先天性关节挛缩为特征的发育障碍。

Am J Hum Genet. 2019 Oct 3;105(4):689-705. doi: 10.1016/j.ajhg.2019.08.006. Epub 2019 Sep 5.

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.基于图的基因组比对和基因分型与 HISAT2 和 HISAT-genotype。

Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2.

RNA sequencing: the teenage years.RNA 测序：青少年时期。

Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SPEAQeasy：一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。

SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献