Neuronal and Vascular Biology Group, UCL Institute of Ophthalmology, University College London, London, UK.
Endothelial Quiescence Group, Centre for Cancer Sciences, Biodiscovery Institute, School of Medicine, University of Nottingham, Nottingham, UK.
Methods Mol Biol. 2022;2441:369-426. doi: 10.1007/978-1-0716-2059-5_29.
RNA-seq is a common approach used to explore gene expression data between experimental conditions or cell types and ultimately leads to information that can shed light on the biological processes involved and inform further hypotheses. While the protocols required to generate samples for sequencing can be performed in most research facilities, the resulting computational analysis is often an area in which researchers have little experience. Here we present a user-friendly bioinformatics workflow which describes the methods required to take raw data produced by RNA sequencing to interpretable results. Widely used and well documented tools are applied. Data quality assessment and read trimming were performed by FastQC and Cutadapt, respectively. Following this, STAR was utilized to map the trimmed reads to a reference genome and the alignment was analyzed by Qualimap. The subsequent mapped reads were quantified by featureCounts. DESeq2 was used to normalize and perform differential expression analysis on the quantified reads, identifying differentially expressed genes and preparing the data for functional enrichment analysis. Gene set enrichment analysis identified enriched gene sets from the normalized count data and clusterProfiler was used to perform functional enrichment against the GO, KEGG, and Reactome databases. Example figures of the functional enrichment analysis results were also generated. The example data used in the workflow are derived from HUVECs, an in vitro model used in the study of endothelial cells, published and publicly available for download from the European Nucleotide Archive.
RNA-seq 是一种常用的方法,用于探索实验条件或细胞类型之间的基因表达数据,最终提供的信息可以揭示涉及的生物学过程,并为进一步的假说提供信息。虽然用于为测序生成样本的方案可以在大多数研究设施中进行,但由此产生的计算分析通常是研究人员经验不足的领域。在这里,我们提出了一个用户友好的生物信息学工作流程,描述了将 RNA 测序产生的原始数据转化为可解释结果所需的方法。我们应用了广泛使用且有良好记录的工具。FastQC 和 Cutadapt 分别用于数据质量评估和读取修剪。之后,使用 STAR 将修剪后的读取映射到参考基因组,并使用 Qualimap 分析对齐情况。接下来,使用 featureCounts 对映射后的读取进行定量。DESeq2 用于对定量读取进行归一化和差异表达分析,识别差异表达基因,并为功能富集分析准备数据。基因集富集分析从归一化计数数据中识别富集的基因集,并用 clusterProfiler 针对 GO、KEGG 和 Reactome 数据库执行功能富集分析。还生成了功能富集分析结果的示例图。工作流程中使用的示例数据来自 HUVECs,这是一种用于内皮细胞研究的体外模型,已发表并可从欧洲核苷酸档案库下载。