Gress A, Ramensky V, Kalinina O V
Department for Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.
Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.
Oncogenesis. 2017 Sep 25;6(9):e380. doi: 10.1038/oncsis.2017.79.
Next-generation sequencing enables simultaneous analysis of hundreds of human genomes associated with a particular phenotype, for example, a disease. These genomes naturally contain a lot of sequence variation that ranges from single-nucleotide variants (SNVs) to large-scale structural rearrangements. In order to establish a functional connection between genotype and disease-associated phenotypes, one needs to distinguish disease drivers from neutral passenger variants. Functional annotation based on experimental assays is feasible only for a limited number of candidate mutations. Thus alternative computational tools are needed. A possible approach to annotating mutations functionally is to consider their spatial location relative to functionally relevant sites in three-dimensional (3D) structures of the harboring proteins. This is impeded by the lack of available protein 3D structures. Complementing experimentally resolved structures with reliable computational models is an attractive alternative. We developed a structure-based approach to characterizing comprehensive sets of non-synonymous single-nucleotide variants (nsSNVs): associated with cancer, non-cancer diseases and putatively functionally neutral. We searched experimentally resolved protein 3D structures for potential homology-modeling templates for proteins harboring corresponding mutations. We found such templates for all proteins with disease-associated nsSNVs, and 51 and 66% of proteins carrying common polymorphisms and annotated benign variants. Many mutations caused by nsSNVs can be found in protein-protein, protein-nucleic acid or protein-ligand complexes. Correction for the number of available templates per protein reveals that protein-protein interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated with non-cancer diseases. Whereas cancer-associated mutations are enriched in DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces. In contrast, mutations associated with non-cancer diseases are in general rare in DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins. All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and nsSNVs associated with non-cancer diseases are additionally enriched in protein core, where they probably affect overall protein stability.
新一代测序技术能够同时分析数百个与特定表型(如疾病)相关的人类基因组。这些基因组自然包含大量的序列变异,范围从单核苷酸变异(SNV)到大规模结构重排。为了在基因型和疾病相关表型之间建立功能联系,需要区分疾病驱动变异和中性乘客变异。基于实验分析的功能注释仅适用于有限数量的候选突变。因此,需要其他计算工具。一种对突变进行功能注释的可能方法是考虑它们相对于所含蛋白质三维(3D)结构中功能相关位点的空间位置。但这受到可用蛋白质3D结构缺乏的阻碍。用可靠的计算模型补充实验解析的结构是一种有吸引力的替代方法。我们开发了一种基于结构的方法来表征与癌症、非癌症疾病以及假定功能中性相关的非同义单核苷酸变异(nsSNV)的综合集。我们在实验解析的蛋白质3D结构中搜索了含有相应突变的蛋白质的潜在同源建模模板。我们为所有携带疾病相关nsSNV的蛋白质找到了这样的模板,以及51%和66%携带常见多态性和注释为良性变异的蛋白质。许多由nsSNV引起的突变可以在蛋白质-蛋白质、蛋白质-核酸或蛋白质-配体复合物中找到。对每个蛋白质可用模板数量的校正表明,蛋白质-蛋白质相互作用界面在癌症nsSNV或与非癌症疾病相关的nsSNV中均未富集。虽然与癌症相关的突变在DNA结合蛋白中富集,但它们很少直接位于DNA相互作用界面。相比之下,与非癌症疾病相关的突变在DNA结合蛋白中通常很少见,但在这些蛋白的DNA相互作用界面中富集。所有与疾病相关的nsSNV在配体结合口袋中都过度富集,与非癌症疾病相关的nsSNV在蛋白质核心中额外富集,在那里它们可能影响蛋白质的整体稳定性。