Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Nat Protoc. 2013 Sep;8(9):1765-86. doi: 10.1038/nprot.2013.099. Epub 2013 Aug 22.
RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.
RNA 测序(RNA-seq)已在生物学的许多领域中迅速被采用,用于转录组的分析,包括基因调控、发育和疾病的研究。特别感兴趣的是发现不同条件下(例如,组织、扰动)差异表达的基因,同时可选地调整影响数据收集过程的其他系统因素。这些分析有许多微妙但至关重要的方面,例如读取计数、对生物变异性的适当处理、质量控制检查以及统计建模的适当设置。文献中已经提出了几种变体,需要就当前最佳实践提供指导。该方案提出了一种基于免费开源 R 语言和 Bioconductor 软件的最先进的计算和统计 RNA-seq 差异表达分析工作流程,特别是基于两种广泛使用的工具,DESeq 和 edgeR。使用标准台式 PC,典型的小实验(例如,4-10 个样本)的实际操作时间可以<1 小时,计算时间<1 天。