Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA.
J Mol Biol. 2011 Oct 21;413(2):495-512. doi: 10.1016/j.jmb.2011.06.046. Epub 2011 Jul 13.
A number of large-scale cancer somatic genome sequencing projects are now identifying genetic alterations in cancers. Evaluation of the effects of these mutations is essential for understanding their contribution to tumorigenesis. We have used SNPs3D, a software suite originally developed for analyzing nonsynonymous germ-line variants, to identify single-base mutations with a high impact on protein structure and function. Two machine learning methods are used: one identifying mutations that destabilize protein three-dimensional structure and the other utilizing sequence conservation and detecting all types of effects on in vivo protein function. Incorporation of detailed structure information into the analysis allows detailed interpretation of the functional effects of mutations in specific cases. Data from a set of breast and colorectal tumors were analyzed. In known cancer genes, mutations approaching 100% of mutations are found to impact protein function, supporting the view that these methods are appropriate for identifying driver mutations. Overall, 50-60% of all somatic missense mutations are predicted to have a high impact on structural stability or to more generally affect the function of the corresponding proteins. This value is similar to the fraction of all possible missense mutations that have a high impact and is much higher than the corresponding one for human population single-nucleotide polymorphisms, at about 30%. The majority of mutations in tumor suppressors destabilize protein structure, while mutations in oncogenes operate in more varied ways, including destabilization of less active conformational states. The set of high-impact mutations encompasses the possible drivers.
许多大规模的癌症体细胞基因组测序项目现在正在识别癌症中的遗传改变。评估这些突变的影响对于了解它们对肿瘤发生的贡献至关重要。我们使用了 SNPs3D,这是一套最初用于分析非同义种系变异的软件套件,以识别对蛋白质结构和功能有重大影响的单碱基突变。使用了两种机器学习方法:一种方法识别使蛋白质三维结构不稳定的突变,另一种方法利用序列保守性并检测对体内蛋白质功能的所有类型的影响。将详细的结构信息纳入分析中,可以对特定情况下突变的功能影响进行详细解释。对一组乳腺癌和结直肠癌肿瘤的数据进行了分析。在已知的癌症基因中,接近 100%的突变被发现影响蛋白质功能,这支持了这些方法适用于识别驱动突变的观点。总体而言,预测 50-60%的所有体细胞错义突变对结构稳定性有重大影响,或者更普遍地影响相应蛋白质的功能。这个值与可能具有重大影响的所有错义突变的分数相似,比人类群体单核苷酸多态性的相应值高约 30%。肿瘤抑制基因中的突变会使蛋白质结构不稳定,而癌基因中的突变则以更多样化的方式起作用,包括使不太活跃的构象状态失稳。高影响突变集包含可能的驱动突变。