Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA.
CSIRO Ecosystem Sciences, Black Mountain Labs, Canberra, ACT 2601, Australia.
Nat Protoc. 2013 Aug;8(8):1494-512. doi: 10.1038/nprot.2013.084. Epub 2013 Jul 11.
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
RNA-seq 数据的从头组装使研究人员能够在不需要基因组序列的情况下研究转录组;这种方法在研究生态和进化重要性的“非模式生物”、癌症样本或微生物组等方面非常有用。在本方案中,我们描述了使用 Trinity 平台从非模式生物的 RNA-seq 数据进行从头转录组组装。我们还介绍了 Trinity 支持的下游应用程序的配套实用程序,包括用于转录物丰度估计的 RSEM、用于跨样本识别差异表达转录物的 R/Bioconductor 包以及识别蛋白质编码基因的方法。在该过程中,我们提供了一种利用 Trinity 平台进行基于基因组的转录组分析的工作流程。该软件、文档和演示均可从 http://trinityrnaseq.sourceforge.net 免费获得。本方案中详细介绍的示例数据集的运行时间高度依赖于要分析的数据的大小和复杂性。在此处详述的过程中分析的示例数据集可以在不到 5 小时的时间内处理完毕。