Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
Nat Biotechnol. 2010 May;28(5):503-10. doi: 10.1038/nbt.1633. Epub 2010 May 2.
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5' start sites, 3' ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
大规模并行 cDNA 测序(RNA-Seq)为研究转录组提供了一种无偏倚的方法,包括编码基因和非编码基因。到目前为止,大多数 RNA-Seq 研究都严重依赖于现有注释,因此主要关注已知转录本的表达水平和变化。在这里,我们提出了 Scripture 方法,该方法仅使用 RNA-Seq 读数和基因组序列来重建哺乳动物细胞的转录组。我们将其应用于小鼠胚胎干细胞、神经前体细胞和肺成纤维细胞,以准确重建大多数已知表达基因的全长基因结构。我们发现蛋白质编码基因存在大量变异,包括数千个新的 5'起始位点、3' 末端和内部编码外显子。然后,我们确定了超过一千个大的基因间非编码 RNA(lincRNA)和反义基因座的基因结构。我们的结果为直接实验操纵数千个非编码 RNA 开辟了道路,并展示了从头重建方法在描绘哺乳动物转录组方面的强大功能。