Adcolony Inc., Bellevue, WA 98004, USA.
Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA.
Bioinformatics. 2021 Jun 9;37(9):1189-1197. doi: 10.1093/bioinformatics/btaa957.
Cancer somatic driver mutations associated with genes within a pathway often show a mutually exclusive pattern across a cohort of patients. This mutually exclusive mutational signal has been frequently used to distinguish driver from passenger mutations and to investigate relationships among driver mutations. Current methods for de novo discovery of mutually exclusive mutational patterns are limited because the heterogeneity in background mutation rate can confound mutational patterns, and the presence of highly mutated genes can lead to spurious patterns. In addition, most methods only focus on a limited number of pre-selected genes and are unable to perform genome-wide analysis due to computational inefficiency.
We introduce a statistical framework, MEScan, for accurate and efficient mutual exclusivity analysis at the genomic scale. Our framework contains a fast and powerful statistical test for mutual exclusivity with adjustment of the background mutation rate and impact of highly mutated genes, and a multi-step procedure for genome-wide screening with the control of false discovery rate. We demonstrate that MEScan more accurately identifies mutually exclusive gene sets than existing methods and is at least two orders of magnitude faster than most methods. By applying MEScan to data from four different cancer types and pan-cancer, we have identified several biologically meaningful mutually exclusive gene sets.
MEScan is available as an R package at https://github.com/MarkeyBBSRF/MEScan.
Supplementary data are available at Bioinformatics online.
与通路内基因相关的癌症体细胞驱动突变在患者队列中经常表现出相互排斥的模式。这种相互排斥的突变信号已被广泛用于区分驱动突变和乘客突变,并研究驱动突变之间的关系。目前用于发现新的相互排斥突变模式的方法存在局限性,因为背景突变率的异质性会混淆突变模式,并且高度突变基因的存在会导致虚假模式。此外,由于计算效率低下,大多数方法仅关注有限数量的预选基因,无法进行全基因组分析。
我们介绍了一个统计框架 MEScan,用于在基因组范围内进行准确和高效的互斥性分析。我们的框架包含一个快速而强大的互斥性统计检验,可调整背景突变率和高度突变基因的影响,并具有多步全基因组筛选程序,可控制假发现率。我们证明 MEScan 比现有方法更准确地识别相互排斥的基因集,并且速度至少快两个数量级。通过将 MEScan 应用于来自四种不同癌症类型和泛癌症的数据,我们已经确定了几个具有生物学意义的相互排斥的基因集。
MEScan 可作为 R 包在 https://github.com/MarkeyBBSRF/MEScan 上获得。
补充数据可在生物信息学在线获得。