Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA.
Nucleic Acids Res. 2014 Feb;42(3):e18. doi: 10.1093/nar/gkt1305. Epub 2013 Dec 19.
A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteins--specifically those known to be involved in molecular interactions--our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer.
癌症基因组学的一个主要挑战是从患者样本中大量的突变基因中发现具有致癌作用的基因。在这里,我们关注蛋白质与核酸、小分子、离子和肽的相互作用,并表明参与这些相互作用的蛋白质残基比其他残基更频繁地受到大规模癌症基因组数据中观察到的突变的影响。我们利用这一观察结果,通过引入一种计算管道(http://canbind.princeton.edu),将大规模癌症外显子数据映射到患者的蛋白质结构上,并自动提取出大量突变影响其核酸、小分子、离子或肽结合位点的蛋白质,从而预测在癌症中发挥功能重要作用的基因。使用这种计算方法,我们表明许多先前已知的与癌症相关的基因在其编码蛋白的结合位点内的突变中富集。通过关注蛋白质的功能相关部分——特别是那些已知参与分子相互作用的部分——我们的方法特别适合检测可能在癌症中很重要但频率较低的突变,这应该有助于扩展我们对癌症基因组景观的功能理解。