da Silveira Willian A, Hazard E Starr, Chung Dongjun, Hardiman Gary
MUSC Bioinformatics, Center for Genomics Medicine, Medical University of South Carolina (MUSC), Charleston, SC, USA.
Institute for Global Food Security, Queens University Belfast, Belfast, UK.
Methods Mol Biol. 2019;1908:185-204. doi: 10.1007/978-1-4939-9004-7_13.
RNAseq is a powerful technique enabling global profiles of transcriptomes in healthy and diseased states. In this chapter we review pipelines to analyze the data generated by sequencing RNA, from raw data to a system level analysis. We first give an overview of workflow to generate mapped reads from FASTQ files, including quality control of FASTQ, filtering and trimming of reads, and alignment of reads to a genome. Then, we compare and contrast three popular options to determine differentially expressed (DE) transcripts (The Tuxedo Pipeline, DESeq2, and Limma/voom). Finally, we examine four tool sets to extrapolate biological meaning from the list of DE genes (Genecards, The Human Protein Atlas, GSEA, and ToppGene). We emphasize the need to ask a concise scientific question and to clearly under stand the strengths and limitations of the methods.
RNA测序是一种强大的技术,能够实现健康和疾病状态下转录组的全局概况分析。在本章中,我们将回顾从原始数据到系统水平分析的RNA测序数据的分析流程。我们首先概述从FASTQ文件生成比对 reads 的工作流程,包括FASTQ的质量控制、reads 的过滤和修剪,以及reads 与基因组的比对。然后,我们比较并对比三种用于确定差异表达(DE)转录本的常用方法(Tuxedo流程、DESeq2和Limma/voom)。最后,我们研究四种工具集,以便从DE基因列表中推断生物学意义(基因卡片、人类蛋白质图谱、基因集富集分析和ToppGene)。我们强调提出简洁科学问题以及清楚理解这些方法的优势和局限性的必要性。