Livingston Robert J, von Niederhausern Andrew, Jegga Anil G, Crawford Dana C, Carlson Christopher S, Rieder Mark J, Gowrisankar Sivakumar, Aronow Bruce J, Weiss Robert B, Nickerson Deborah A
Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA.
Genome Res. 2004 Oct;14(10A):1821-31. doi: 10.1101/gr.2730004. Epub 2004 Sep 13.
To promote the clinical and epidemiological studies that improve our understanding of human genetic susceptibility to environmental exposure, the Environmental Genome Project (EGP) has scanned 213 environmental response genes involved in DNA repair, cell cycle regulation, apoptosis, and metabolism for single nucleotide polymorphisms (SNPs). Many of these genes have been implicated by loss-of-function mutations associated with severe diseases attributable to decreased protection of genomic integrity. Therefore, the hypothesis for these studies is that individuals with functionally significant polymorphisms within these genes may be particularly susceptible to genotoxic environmental agents. On average, 20.4 kb of baseline genomic sequence or 86% of each gene, including a substantial amount of introns, all exons, and 1.3 kb upstream and downstream, were scanned for variations in the 90 samples of the Polymorphism Discovery Resource panel. The average nucleotide diversity across the 4.2 MB of these 213 genes is 6.7 x 10(-4), or one SNP every 1500 bp, when two random chromosomes are compared. The average candidate environmental response gene contains 26 PHASE inferred haplotypes, 34 common SNPs, 6.2 coding SNPs (cSNPs), and 2.5 nonsynonymous cSNPs. SIFT and Polyphen analysis of 541 nonsynonymous cSNPs identified 57 potentially deleterious SNPs. An additional eight polymorphisms predict altered protein translation. Because these genes represent 1% of all known human genes, extrapolation from these data predicts the total genomic set of cSNPs, nonsynonymous cSNPs, and potentially deleterious nonsynonymous cSNPs. The implications for the use of these data in direct and indirect association studies of environmentally induced diseases are discussed.
为推动临床和流行病学研究,以增进我们对人类遗传易感性与环境暴露关系的理解,环境基因组计划(EGP)已对213个参与DNA修复、细胞周期调控、细胞凋亡及代谢的环境响应基因进行了单核苷酸多态性(SNP)扫描。这些基因中有许多因功能丧失性突变而与严重疾病相关,这些疾病归因于基因组完整性保护的降低。因此,这些研究的假设是,这些基因内具有功能显著多态性的个体可能对基因毒性环境因子特别敏感。平均而言,在多态性发现资源小组的90个样本中,对20.4 kb的基线基因组序列或每个基因的86%进行了扫描,包括大量内含子、所有外显子以及上下游1.3 kb的区域,以寻找变异。当比较两条随机染色体时,这213个基因的4.2 MB区域内的平均核苷酸多样性为6.7×10⁻⁴,即每1500 bp有一个SNP。平均每个候选环境响应基因包含26个PHASE推断单倍型、34个常见SNP、6.2个编码SNP(cSNP)和2.5个非同义cSNP。对541个非同义cSNP进行的SIFT和Polyphen分析确定了57个潜在有害SNP。另外8个多态性预测蛋白质翻译改变。由于这些基因占所有已知人类基因的1%,从这些数据推断可预测cSNP、非同义cSNP和潜在有害非同义cSNP的全基因组集。本文还讨论了这些数据在环境诱发疾病的直接和间接关联研究中的应用意义。