Suppr超能文献

使用单核苷酸多态性(SNP)密集图谱进行基因型预测。

Genotype prediction using a dense map of SNPs.

作者信息

Evans David M, Cardon Lon R, Morris Andrew P

机构信息

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.

出版信息

Genet Epidemiol. 2004 Dec;27(4):375-84. doi: 10.1002/gepi.20045.

Abstract

The International Haplotype Mapping Project (HapMap) aims to characterize the distribution and extent of linkage disequilibrium (LD) throughout the human genome, thereby facilitating genome-wide association analysis and the search for the genetic determinants of complex diseases. Implicit in the rationale behind the project is the expectation that hidden (unobserved) disease-causing variants will be in significant LD with surrounding typed markers and will thus be amenable to detection using association-based mapping approaches. In order to investigate the validity of this assumption, we examined more than 5,000 SNPs across a 10-MB region of chromosome 20 in a sample of 96 unrelated African-American and 96 unrelated Caucasian individuals. We treated observed loci as surrogates for hidden SNPs by pretending that individuals' genotypes were unknown. We then attempted to predict these genotypes at the surrogate hidden SNP by using information about LD in the region and genotypes at surrounding observed loci. Our method is based on finding the most likely genotype for each individual, given all possible haplotype pairs consistent with observed genotypes for that individual at surrounding loci, and given the frequencies of those haplotypes in an independent sample. Our method performs extremely well in predicting genotypes in areas of high LD. Furthermore, in areas of low LD, our method results in substantial gains in predictive accuracy as compared to pair-wise strategies. These results suggest that pair-wise tests of disease-marker association may be inferior to multipoint methods, which take advantage of the information contained within multi-locus haplotypes.

摘要

国际单倍型图谱计划(HapMap)旨在描绘整个人类基因组中连锁不平衡(LD)的分布和程度,从而促进全基因组关联分析以及对复杂疾病遗传决定因素的探寻。该计划背后的基本原理中隐含着这样一种期望,即隐藏的(未观察到的)致病变异将与周围已分型的标记存在显著的连锁不平衡,因此可以通过基于关联的定位方法进行检测。为了研究这一假设的有效性,我们在96名无亲缘关系的非裔美国人和96名无亲缘关系的高加索人个体样本中,对20号染色体上一个10兆碱基区域内的5000多个单核苷酸多态性(SNP)进行了检测。我们将观察到的位点视为隐藏SNP的替代物,假装个体的基因型是未知的。然后,我们试图利用该区域连锁不平衡的信息以及周围观察到的位点的基因型来预测替代隐藏SNP处的这些基因型。我们的方法基于为每个个体找到最可能的基因型,考虑到与该个体在周围位点观察到的基因型一致的所有可能的单倍型对,以及这些单倍型在独立样本中的频率。我们的方法在预测高连锁不平衡区域的基因型方面表现极佳。此外,在低连锁不平衡区域,与成对策略相比,我们的方法在预测准确性上有显著提高。这些结果表明,疾病标记关联的成对检验可能不如利用多位点单倍型中所含信息的多点方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验