Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA.
Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland, USA.
Nat Protoc. 2016 Sep;11(9):1650-67. doi: 10.1038/nprot.2016.095. Epub 2016 Aug 11.
High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.
mRNA 的高通量测序(RNA-seq)已成为测量和比较各种物种和条件下基因表达水平的标准方法。RNA-seq 实验产生非常大、复杂的数据,需要快速、准确和灵活的软件将原始读取数据简化为可理解的结果。HISAT(转录物拼接对齐的层次索引)、StringTie 和 Ballgown 是用于 RNA-seq 实验全面分析的免费开源软件工具。它们共同允许科学家将读取与基因组对齐,组装包括新剪接变体的转录本,计算每个样本中这些转录本的丰度,并比较实验以识别差异表达的基因和转录本。本方案描述了处理大量原始测序读取并创建基因转录本、表达水平以及差异表达的基因和转录本列表所需的所有步骤。方案的执行时间取决于计算资源,但通常需要不到 45 分钟的计算机时间。HISAT、StringTie 和 Ballgown 可从 http://ccb.jhu.edu/software.shtml 获取。