Laboratory of Computational Biology, Center for Human Genetics, KU Leuven, Leuven, Belgium.
Laboratory for the Molecular Biology of Leukemia, Center for Human Genetics, KU Leuven and Center for the Biology of Disease, VIB, Leuven, Belgium ; Division of Hematology, Department of Cellular Biotechnologies and Hematology, 'Sapienza' University of Rome, Rome, Italy.
PLoS Genet. 2013;9(12):e1003997. doi: 10.1371/journal.pgen.1003997. Epub 2013 Dec 19.
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
RNA-seq 是一种很有前途的技术,可用于重新测序蛋白质编码基因,以鉴定单核苷酸变异(SNV),同时获得结构变异和基因表达扰动的信息。我们想知道 RNA-seq 是否适合检测 T 细胞急性淋巴细胞白血病(T-ALL)中的驱动突变。这些白血病是由基因融合、转录因子过表达以及癌基因和肿瘤抑制基因中的协同点突变共同引起的。我们通过高通量配对末端 RNA-seq 分析了 31 个 T-ALL 患者样本和 18 个 T-ALL 细胞系。首先,我们通过与外显子重测序数据进行比较,优化了 RNA-seq 数据中 SNV 的检测。我们鉴定了具有反复出现的蛋白改变变异的已知驱动基因,以及包括 H3F3A、PTK2B 和 STAT5B 在内的几个新候选基因。接下来,我们通过标准化和批次效应去除,从 RNA-seq 数据中确定了准确的基因表达水平,并将这些数据用于将患者分类为 T-ALL 亚型。最后,我们检测了基因融合,其中一些可以解释 TLX1、PLAG1、LMO1 或 NKX2-1 等关键驱动基因的过表达;而其他融合则导致编码激活激酶的新型融合转录本(SSBP2-FER 和 TPM3-JAK2)或涉及 MLLT10。总之,我们提出了用于 RNA-seq 数据变异调用、变异过滤和表达标准化的新分析流程,并成功地将这些方法应用于 T-ALL 中转位、点突变、INDELs、外显子跳跃事件和表达扰动的检测。