Bollas Audrey, Gaither Jeffrey, Schieffer Kathleen M, White Peter, Mardis Elaine R
The Office of Data Sciences, The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, USA.
The Steve and Cindy Rasmussen Institute for Genomic Medicine, The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, USA.
Commun Med (Lond). 2025 May 28;5(1):202. doi: 10.1038/s43856-025-00901-y.
Genetic variants play a pivotal role in the initiation and progression of many diseases, including cancer. Detecting these variants is the first step in understanding their contribution to disease mechanisms. RNA sequencing (RNA-Seq) has become a crucial assay in cancer research, offering insights beyond those provided by DNA sequencing. This study introduces VarRNA, a novel method that utilizes RNA-Seq data to classify single nucleotide variants and insertions/deletions from tumor transcriptomes.
VarRNA distinguishes transcriptome variants as germline, somatic, or artifact using a combination of two XGBoost machine learning models. These models were trained and validated using a cohort of pediatric cancer samples with paired tumor and normal DNA exome sequencing data serving as ground truth. We performed additional validation on RNA-Seq data from two distinct cancer datasets, demonstrating that VarRNA outperforms existing RNA variant calling methods.
VarRNA identifies 50% of the variants detected by exome sequencing and detects unique RNA variants absent in paired tumor and normal DNA exome data. Some variants classified by VarRNA exhibit variant allele frequencies distinct from the corresponding DNA exome data. Strikingly, this phenomenon is prevalent in cancer-driving genes, where VarRNA analysis of the RNA-Seq data reveals the variant allele expression as much higher than expected based on the exome sequencing data.
These findings highlight the potential of RNA-Seq not only to uncover clinically relevant genetic variants but also to offer a deeper understanding of disease-specific expression dynamics that influence cancer pathogenesis, with implications for prognosis and therapeutic strategies.
基因变异在包括癌症在内的许多疾病的发生和发展中起着关键作用。检测这些变异是了解它们对疾病机制贡献的第一步。RNA测序(RNA-Seq)已成为癌症研究中的一项关键检测方法,提供了超越DNA测序的见解。本研究介绍了VarRNA,这是一种利用RNA-Seq数据对肿瘤转录组中的单核苷酸变异和插入/缺失进行分类的新方法。
VarRNA使用两个XGBoost机器学习模型的组合,将转录组变异区分为种系变异、体细胞变异或人为因素造成的变异。这些模型使用一组儿科癌症样本进行训练和验证,以配对的肿瘤和正常DNA外显子组测序数据作为基准事实。我们对来自两个不同癌症数据集的RNA-Seq数据进行了额外的验证,证明VarRNA优于现有的RNA变异检测方法。
VarRNA识别出外显子组测序检测到的50%的变异,并检测到配对的肿瘤和正常DNA外显子组数据中不存在的独特RNA变异。VarRNA分类的一些变异表现出与相应DNA外显子组数据不同的变异等位基因频率。令人惊讶的是,这种现象在癌症驱动基因中很普遍,其中对RNA-Seq数据的VarRNA分析显示,变异等位基因表达远高于基于外显子组测序数据的预期。
这些发现突出了RNA-Seq的潜力,它不仅能揭示临床相关的基因变异,还能更深入地了解影响癌症发病机制的疾病特异性表达动态,对预后和治疗策略具有重要意义。