Chen Yunshun, Lun Aaron T L, Smyth Gordon K
The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia; Department of Medical Biology, The University of Melbourne, Victoria, 3010, Australia.
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.
F1000Res. 2016 Jun 20;5:1438. doi: 10.12688/f1000research.8987.2. eCollection 2016.
In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.
近年来,RNA测序(RNA-seq)已成为一种广泛应用于基因表达谱分析的技术。RNA-seq分析最常见的目标之一是识别在两种或更多生物条件之间差异表达(DE)的基因或分子途径。本文通过对小鼠乳腺上皮细胞亚群的RNA-seq实验进行完整分析,展示了一种从RNA-seq数据中检测DE基因和途径的计算工作流程。该工作流程使用开源生物导体项目的R软件包,涵盖了分析流程的所有步骤,包括读取序列比对、数据探索、差异表达分析、可视化和途径分析。使用Rsubread包进行读取比对和计数定量,并使用edgeR包进行统计分析。差异表达分析使用edgeR的拟似然功能。