Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
Cancer Science Institute of Singapore, National University of Singapore, Singapore, 117599, Singapore.
Genome Biol. 2021 Jan 22;22(1):44. doi: 10.1186/s13059-021-02261-x.
Deregulated gene expression is a hallmark of cancer; however, most studies to date have analyzed short-read RNA sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short-read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality.
We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which > 66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories, are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information.
Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.
基因表达失调是癌症的一个标志;然而,迄今为止的大多数研究都分析了具有固有局限性的短读 RNA 测序数据。在这里,我们结合 PacBio 长读长转录本测序(Iso-Seq)和 Illumina 配对末端短读 RNA 测序,全面调查了胃癌(GC)的转录组,GC 是全球癌症死亡率的主要原因。
我们对涵盖四大 GC 分子亚型(染色体不稳定型、EBV 阳性型、基因组稳定型和微卫星不稳定型)的 10 种 GC 细胞系进行了全长转录组分析。我们鉴定了 60239 个非冗余全长转录本,其中 >66%与当前转录组数据库相比是新的。新的异构体更有可能是细胞系和亚型特异性的,表达水平较低,外显子数量较多,异构体/编码序列较长。大多数新的异构体利用替代的第一外显子,与其他选择性剪接类别相比,表达水平更高,变异性更高。总的来说,我们观察到 25%的检测基因中存在替代启动子的使用,其中大多数(84.2%)已知/新的启动子对表现出其编码序列的潜在变化。将这些替代启动子映射到 TCGA GC 样本中,我们鉴定了几种与癌症相关的异构体,包括癌基因的新变体。肿瘤特异性转录本异构体往往比其他异构体更能改变蛋白质编码序列。对结果数据的分析表明,新的异构体可能提供额外的预后信息。
我们的结果为深入研究 GC 和其他胃肠道恶性肿瘤提供了丰富的全长转录组数据资源。