Stitziel Nathan O, Tseng Yan Yuan, Pervouchine Dimitri, Goddeau David, Kasif Simon, Liang Jie
Department of Bioengineering SEO, MC-063, University of Illinois at Chicago, Room 218, 851, S. Morgan Street, Chicago, IL 60607-7052, USA.
J Mol Biol. 2003 Apr 11;327(5):1021-30. doi: 10.1016/s0022-2836(03)00240-7.
Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.
基因的非同义单核苷酸多态性(nsSNP)会导致蛋白质的氨基酸发生变化,并在提供遗传功能多样性方面发挥重要作用。为了了解疾病相关单核苷酸多态性(SNP)的结构特征,我们已将一组源自人类在线孟德尔遗传(OMIM)数据库的nsSNP映射到编码蛋白质的结构表面。这些nsSNP与疾病相关或具有独特的表型。作为对照数据集,我们将一组源自SNP数据库dbSNP的nsSNP映射到那些编码蛋白质的结构表面。使用计算几何中的α形状方法,我们研究了这些nsSNP的结构位点的几何位置。我们将每个nsSNP位点分类为三种几何位置类别之一:位于口袋或空隙中的位点(P型);位于凸区域或浅凹陷区域的位点(S型);以及完全埋藏在内部的位点(I型)。我们发现,大多数(88%)与疾病相关的nsSNP位于空隙或口袋中,并且在蛋白质内部很少观察到(数据集中为3.2%)。我们发现从dbSNP映射的nsSNP不太可能位于口袋或空隙中(68%)。我们进一步引入了隐马尔可夫模型(HMM)的一种新应用,用于分析各种几何位点上SNP的序列同源性。对于表面口袋或空隙上的SNP,我们发现它们在保守残基上出现没有强烈的倾向。对于埋藏在内部的SNP,我们发现与疾病相关的突变更有可能是保守的。本研究中开发的用α形状和HMM对nsSNP进行分类的方法可以与其他方法相结合,以提高预测给定nsSNP是否可能与疾病相关的准确性。