Batmanov Kirill, Delabie Jan, Wang Junbai
Department of Pathology, Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway.
Department of Pathology, University Health Network, Toronto, ON, Canada.
Front Genet. 2019 Apr 2;10:282. doi: 10.3389/fgene.2019.00282. eCollection 2019.
Most of somatic mutations in cancer occur outside of gene coding regions. These mutations may disrupt the gene regulation by affecting protein-DNA interaction. A study of these disruptions is important in understanding tumorigenesis. However, current computational tools process DNA sequence variants individually, when predicting the effect on protein-DNA binding. Thus, it is a daunting task to identify functional regulatory disturbances among thousands of mutations in a patient. Previously, we have reported and validated a pipeline for identifying functional non-coding somatic mutations in cancer patient cohorts, by integrating diverse information such as gene expression, spatial distribution of the mutations, and a biophysical model for estimating protein binding affinity. Here, we present a new user-friendly Python package BayesPI-BAR2 based on the proposed pipeline for integrative whole-genome sequence analysis. This may be the first prediction package that considers information from both multiple mutations and multiple patients. It is evaluated in follicular lymphoma and skin cancer patients, by focusing on sequence variants in gene promoter regions. BayesPI-BAR2 is a useful tool for predicting functional non-coding mutations in whole genome sequencing data: it allows identification of novel transcription factors (TFs) whose binding is altered by non-coding mutations in cancer. BayesPI-BAR2 program can analyze multiple datasets of genome-wide mutations at once and generate concise, easily interpretable reports for potentially affected gene regulatory sites. The package is freely available at http://folk.uio.no/junbaiw/BayesPI-BAR2/.
癌症中的大多数体细胞突变发生在基因编码区域之外。这些突变可能通过影响蛋白质与DNA的相互作用来破坏基因调控。对这些破坏的研究对于理解肿瘤发生至关重要。然而,当前的计算工具在预测对蛋白质与DNA结合的影响时,是单独处理DNA序列变异的。因此,在患者的数千个突变中识别功能性调控干扰是一项艰巨的任务。此前,我们已经报告并验证了一种用于在癌症患者队列中识别功能性非编码体细胞突变的流程,该流程整合了多种信息,如基因表达、突变的空间分布以及用于估计蛋白质结合亲和力的生物物理模型。在这里,我们基于所提出的用于整合全基因组序列分析的流程,展示了一个新的用户友好型Python包BayesPI - BAR2。这可能是第一个考虑来自多个突变和多个患者信息的预测包。我们通过关注基因启动子区域的序列变异,在滤泡性淋巴瘤和皮肤癌患者中对其进行了评估。BayesPI - BAR2是预测全基因组测序数据中功能性非编码突变的有用工具:它能够识别其结合因癌症中的非编码突变而改变的新型转录因子(TFs)。BayesPI - BAR2程序可以一次性分析多个全基因组突变数据集,并为潜在受影响的基因调控位点生成简洁、易于解释的报告。该软件包可在http://folk.uio.no/junbaiw/BayesPI - BAR2/免费获取。