研究人类基因组中的单核苷酸多态性（SNP）密度及其对分子进化的影响。

Investigating single nucleotide polymorphism (SNP) density in the human genome and its implications for molecular evolution.

作者信息

Zhao Zhongming, Fu Yun-Xin, Hewett-Emmett David, Boerwinkle Eric

机构信息

Human Genetics Center, 1200 Herman Pressler, Suite E447, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Gene. 2003 Jul 17;312:207-13. doi: 10.1016/s0378-1119(03)00670-x.

DOI:10.1016/s0378-1119(03)00670-x

PMID:12909357

Abstract

We investigated the single nucleotide polymorphism (SNP) density across the human genome and in different genic categories using two SNP databases: Celera's CgsSNP, which includes SNPs identified by comparing genomic sequences, and Celera's RefSNP, which includes SNPs from a variety of sources and is biased toward disease-associated genes. Based on CgsSNP, the average numbers of SNPs per 10 kb was 8.33, 8.44, and 8.09 in the human genome, in intergenic regions, and in genic regions, respectively. In genic regions, the SNP density in intronic, exonic and adjoining untranslated regions was 8.21, 5.28, and 7.51 SNPs per 10 kb, respectively. The pattern of SNP density based on RefSNP was different from that based on CgsSNP, emphasizing its utility for genotype-phenotype association studies but not for most population genetic studies. The number of SNPs per chromosome was correlated with chromosome length, but the density of SNPs estimated by CgsSNP was not significantly correlated with the GC content of the chromosome. Based on CgsSNP, the ratio of nonsense to missense mutations (0.027), the ratio of missense to silent mutations (1.15), and the ratio of non-synonymous to synonymous mutations (1.18) was less than half of that expected in a human protein coding sequence under the neutral mutation theory, reflecting a role for natural selection, especially purifying selection.

摘要

我们使用两个SNP数据库，研究了人类基因组以及不同基因类别中的单核苷酸多态性（SNP）密度：赛雷拉公司的CgsSNP，其中包含通过比较基因组序列鉴定出的SNP；以及赛雷拉公司的RefSNP，其中包含来自多种来源的SNP，且偏向于与疾病相关的基因。基于CgsSNP，人类基因组、基因间区域和基因区域中每10 kb的SNP平均数量分别为8.33、8.44和8.09。在基因区域，内含子、外显子和相邻非翻译区域的SNP密度分别为每10 kb 8.21、5.28和7.51个SNP。基于RefSNP的SNP密度模式与基于CgsSNP的不同，这突出了其在基因型-表型关联研究中的效用，但不适用于大多数群体遗传学研究。每条染色体的SNP数量与染色体长度相关，但通过CgsSNP估计的SNP密度与染色体的GC含量无显著相关性。基于CgsSNP，无义突变与错义突变的比例（0.027）、错义突变与沉默突变的比例（1.15）以及非同义突变与同义突变的比例（1.18）不到中性突变理论下人类蛋白质编码序列预期比例的一半，这反映了自然选择尤其是纯化选择的作用。