Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
Nat Protoc. 2012 Mar 1;7(3):562-78. doi: 10.1038/nprot.2012.016.
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
高通量 cDNA 测序(RNA-seq)的最新进展可以在单次测定中揭示新基因和剪接变体,并对全基因组的表达进行定量。RNA-seq 实验产生的数据的数量和复杂性需要可扩展、快速且具有数学原理的分析软件。TopHat 和 Cufflinks 是用于基因发现和高通量 mRNA 测序(RNA-seq)数据综合表达分析的免费、开源软件工具。它们共同允许生物学家识别新基因和已知基因的新剪接变体,以及比较两种或更多条件下的基因和转录物表达。本协议详细描述了如何使用 TopHat 和 Cufflinks 进行此类分析。它还涵盖了几个辅助工具和实用程序,可帮助管理数据,包括 CummeRbund,这是一种用于可视化 RNA-seq 分析结果的工具。尽管该程序假定了基本的信息学技能,但这些工具假设对 RNA-seq 分析几乎没有背景知识,并且适合新手和专家使用。该协议从原始测序读取开始,生成转录组组装、差异表达和调节基因和转录本列表,以及分析结果的出版质量可视化。该协议的执行时间取决于转录组测序数据的量和可用的计算资源,但对于典型实验来说,不到 1 天的计算机时间,实际操作时间约为 1 小时。