Suppr超能文献

RNA-SSNV:一种用于批量RNA测序数据的可靠体细胞单核苷酸变异识别框架。

RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data.

作者信息

Long Qihan, Yuan Yangyang, Li Miaoxin

机构信息

Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.

Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.

出版信息

Front Genet. 2022 Jun 30;13:865313. doi: 10.3389/fgene.2022.865313. eCollection 2022.

Abstract

The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision-recall rate (0.880-0.884) in a testing dataset and robustly retained 0.94 AUC for the precision-recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.

摘要

表达的体细胞突变的应用在识别活跃的癌症驱动突变方面可能具有独特优势。然而,由于RNA编辑、逆转录和缺口比对等混杂因素,从RNA测序数据中准确识别突变具有一定难度。在本研究中,我们提出了一个框架(名为RNA-SSNV,https://github.com/pmglab/RNA-SSNV),用于从肿瘤组织RNA测序数据中识别体细胞单核苷酸变异(SSNV)。基于全面的多重过滤策略和使用全面筛选特征训练的机器学习分类模型,RNA-SSNV在测试数据集中实现了最佳的精确召回率(0.880 - 0.884),并且在三个基于成人的验证TCGA(癌症基因组图谱)数据集中,精确召回曲线的AUC稳健地保持在0.94。我们进一步表明,RNA-SSNV识别出的体细胞突变在已知驱动基因中往往具有更高的功能影响和治疗潜力。此外,VAF(变异等位基因频率)分析表明,携带表达突变的亚克隆具有进化选择优势,并且RNA对挽救DNA遗漏突变具有更高的检测能力。总之,RNA-SSNV将是一种准确识别表达的体细胞突变的有用方法,有助于更深入地分析癌症驱动基因和致癌机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d48/9279659/41e7ba81ab69/fgene-13-865313-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验