Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center.
Department of Otolaryngology-Head and Neck Surgery.
Bioinformatics. 2018 Jun 1;34(11):1859-1867. doi: 10.1093/bioinformatics/bty004.
Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.
We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.
SEVA is implemented in the R/Bioconductor package GSReg.
bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu.
Supplementary data are available at Bioinformatics online.
目前用于检测不同表型中基因异构体使用变化的生物信息学方法是比较表型中相对预期的异构体使用情况。这些统计模型考虑了正常组织中异构体使用的差异,正常组织中基因剪接的调控是稳定的。然而,癌症等病理条件可能会导致剪接的调控被打破,从而增加剪接变体表达的异质性。推断基因异构体使用中具有这种差异异质性的事件需要新的统计方法。
我们引入了剪接表达变异性分析(SEVA)来模拟条件(例如肿瘤和正常样本)之间剪接变体使用的异质性增加。SEVA 使用基于秩的多变量统计方法,该方法比较同一条件内的连接表达谱的变异性与另一条件内的变异性。模拟数据表明,SEVA 在建模基因异构体使用的异质性方面是独特的,并将 SEVA 的性能与 EBSeq、DiffSplice 和 rMATS 进行基准测试,这些方法是针对差异异构体使用而不是异质性进行建模的。我们在头颈部癌症中确认了 SEVA 识别已知剪接变体的准确性,并对头颈部癌症中的新型剪接变体进行了跨研究验证。对头颈部癌症亚型之间剪接变体异质性的新比较表明,HPV 阳性和 HPV 阴性亚型中基因异构体使用的异质性出人意料地相似,而在调节剪接变体机制的基因中发生突变的 HPV 阴性样本中,异质性预计会增加。这些结果表明,SEVA 可以从 RNA-seq 数据中准确地建模基因异构体使用的差异异质性。
SEVA 是在 R/Bioconductor 包 GSReg 中实现的。
bahman@jhu.edu 或 favorov@sensi.org 或 ejfertig@jhmi.edu。
补充数据可在 Bioinformatics 在线获取。