Van den Eynden Jimmy, Larsson Erik
Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of GothenburgGothenburg, Sweden.
Unit of Public Health and Genome, Public Health and Surveillance, Scientific Institute of Public HealthBrussels, Belgium.
Front Genet. 2017 Jun 8;8:74. doi: 10.3389/fgene.2017.00074. eCollection 2017.
Large cancer genome sequencing initiatives have led to the identification of cancer driver genes based on signals of positive selection in somatic mutation data. Additionally, the identification of purifying (negative) selection has the potential to identify essential genes that may be of therapeutic interest. The most widely used way of quantifying selection pressures in protein-coding genes is the dN/dS metric, which compares non-synonymous to synonymous substitution rates. In this study, we examine whether and how this metric is influenced by the mutational processes that have been active during tumor evolution. We use exome sequencing data from six different cancer types from The Cancer Genome Atlas (TCGA) and demonstrate that dN/dS in its basic form, where uniform base substitution probabilities are assumed, is in fact strongly biased by these mutational processes. This is particularly true in malignant melanoma, where the mutational signature is characterized by a high amount of UV-induced cytosine to thymine mutations at dipyrimidine dinucleotides. This increases the likelihood of random synonymous mutations occurring in hydrophobic amino acid codons, leading to reduced dN/dS ratios in genes encoding membrane proteins and falsely suggesting purifying selection in these genes. When this effect is corrected for by taking mutational signature-derived substitution probabilities into account, purifying selection was found to be limited and similar in all cancer types studied. Our results demonstrate that it is crucial to take mutational signatures into account when applying the dN/dS metric to cancer somatic mutation data.
大型癌症基因组测序计划已基于体细胞突变数据中的正选择信号,实现了癌症驱动基因的识别。此外,纯化(负)选择的识别有潜力鉴定出可能具有治疗意义的必需基因。在蛋白质编码基因中,量化选择压力最广泛使用的方法是dN/dS指标,该指标比较非同义替换率与同义替换率。在本研究中,我们探究了该指标是否以及如何受到肿瘤进化过程中活跃的突变过程的影响。我们使用了来自癌症基因组图谱(TCGA)六种不同癌症类型的外显子组测序数据,并证明在假设碱基替换概率均匀的基本形式下,dN/dS实际上受到这些突变过程的强烈影响。在恶性黑色素瘤中尤其如此,其突变特征表现为在二嘧啶二核苷酸处有大量紫外线诱导的胞嘧啶到胸腺嘧啶的突变。这增加了疏水氨基酸密码子中随机同义突变发生的可能性,导致编码膜蛋白的基因中dN/dS比值降低,并错误地表明这些基因存在纯化选择。当通过考虑源自突变特征的替换概率来校正这种效应时,发现在所有研究的癌症类型中,纯化选择都是有限且相似的。我们的结果表明,在将dN/dS指标应用于癌症体细胞突变数据时,考虑突变特征至关重要。