对密集SNP数据进行多次子采样可提高疾病基因定位的精度。

Multiple subsampling of dense SNP data localizes disease genes with increased precision.

作者信息

Stewart William C L, Peljto Anna L, Greenberg David A

机构信息

Columbia University, Mailman School of Public Health, Division of Statistical Genetics, Department of Biostatistics, 722 W. 168th Street, 6th floor, New York, NY 10032, USA.

出版信息

Hum Hered. 2010;69(3):152-9. doi: 10.1159/000267995. Epub 2009 Dec 18.

DOI:10.1159/000267995

PMID:20029227

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2918647/

Abstract

BACKGROUND/AIMS: Current linkage studies detect and localize trait loci using genotypes sampled at hundreds of thousands of single nucleotide polymorphisms (SNPs). Such data should provide precise estimates of trait location once linkage has been established. However, correlations between nearby SNPs can distort the information about trait location. Traditionally, when faced with this dilemma, three approaches have been used: (1) ignore the correlation; (2) approximate the correlation; or, (3) analyze a single, approximately uncorrelated subset of the original dense data.

METHODS

Here, we examine and test a simple and efficient estimator of trait location that averages location estimates across random subsamples of the original dense data. Based on pairwise estimates of correlation, we ensure that the SNPs within each subsample are approximately uncorrelated. In addition, we use the nonparametric bootstrap procedure to compute narrow, high-resolution candidate gene regions (i.e. confidence intervals for the true trait location).

RESULTS

Using simulated data, we show that the three existing approaches to dense SNP linkage analysis (described above) can yield biased and/or inefficient estimation depending on the underlying correlation structure. With respect to mean squared error, our estimator outperforms the third approach, and is as good as, but usually better than the first and second approaches. Relative to the third approach, our estimator led to a 47.5% reduction in the candidate gene region length based on the analysis of 15 hypertension families genotyped at approximately 500,000 SNPs.

CONCLUSION

The method we developed will be an important tool for constructing high-resolution candidate gene regions that could ultimately aid in targeting regions for sequencing projects.

摘要

背景/目的：当前的连锁研究利用数十万个单核苷酸多态性（SNP）位点的基因型来检测和定位性状基因座。一旦确定连锁关系，此类数据应能提供性状位置的精确估计。然而，相邻SNP之间的相关性可能会扭曲有关性状位置的信息。传统上，面对这一困境时采用了三种方法：（1）忽略相关性；（2）近似相关性；或者，（3）分析原始密集数据中单个近似不相关的子集。

方法

在此，我们研究并测试了一种简单有效的性状位置估计方法，该方法对原始密集数据的随机子样本的位置估计进行平均。基于成对相关性估计，我们确保每个子样本中的SNP近似不相关。此外，我们使用非参数自助法程序来计算狭窄、高分辨率的候选基因区域（即真实性状位置的置信区间）。

结果

使用模拟数据，我们表明上述三种现有的密集SNP连锁分析方法可能会根据潜在的相关结构产生有偏差和/或低效的估计。就均方误差而言，我们的估计方法优于第三种方法，与第一种和第二种方法相当，但通常更好。相对于第三种方法，基于对15个高血压家族进行约50万个SNP基因分型的分析，我们的估计方法使候选基因区域长度减少了47.5%。

结论

我们开发的方法将成为构建高分辨率候选基因区域的重要工具，最终有助于确定测序项目的目标区域。

相似文献

Multiple subsampling of dense SNP data localizes disease genes with increased precision.

Hum Hered. 2010;69(3):152-9. doi: 10.1159/000267995. Epub 2009 Dec 18.

Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data.

Am J Hum Genet. 2004 May;74(5):945-53. doi: 10.1086/420773. Epub 2004 Apr 7.

Handling linkage disequilibrium in qualitative trait linkage analysis using dense SNPs: a two-step strategy.

BMC Genet. 2009 Aug 10;10:44. doi: 10.1186/1471-2156-10-44.

Prioritize and select SNPs for association studies with multi-stage designs.

J Comput Biol. 2008 Apr;15(3):241-57. doi: 10.1089/cmb.2007.0090.

Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense single-nucleotide polymorphism map.

Genetics. 2008 Apr;178(4):2227-35. doi: 10.1534/genetics.107.085035.

Linkage mapping bovine EST-based SNP.

BMC Genomics. 2005 May 19;6:74. doi: 10.1186/1471-2164-6-74.

Genetic variation and association mapping for 12 agronomic traits in indica rice.

BMC Genomics. 2015 Dec 16;16:1067. doi: 10.1186/s12864-015-2245-2.

SNPs, haplotypes, and model selection in a candidate gene region: the SIMPle analysis for multilocus data.

Genet Epidemiol. 2004 Dec;27(4):429-41. doi: 10.1002/gepi.20039.

A High-Density SNP Genetic Linkage Map and QTL Analysis of Growth-Related Traits in a Hybrid Family of Oysters (Crassostrea gigas × Crassostrea angulata) Using Genotyping-by-Sequencing.

G3 (Bethesda). 2016 May 3;6(5):1417-26. doi: 10.1534/g3.116.026971.

Deciphering the genomic architecture of the stickleback brain with a novel multilocus gene-mapping approach.

Mol Ecol. 2017 Mar;26(6):1557-1575. doi: 10.1111/mec.14005. Epub 2017 Jan 27.

引用本文的文献

A powerful test of independent assortment that determines genome-wide significance quickly and accurately.

Heredity (Edinb). 2016 Aug;117(2):109-13. doi: 10.1038/hdy.2016.33. Epub 2016 Jun 1.

Next-generation linkage and association methods applied to hypertension: a multifaceted approach to the analysis of sequence data.

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S111. doi: 10.1186/1753-6561-8-S1-S111. eCollection 2014.

Increasing the power of association studies with affected families, unrelated cases and controls.

Front Genet. 2013 Oct 24;4:200. doi: 10.3389/fgene.2013.00200. eCollection 2013.

Obtaining accurate p values from a dense SNP linkage scan.

Hum Hered. 2012;74(1):12-6. doi: 10.1159/000342754. Epub 2012 Oct 3.

How should we be searching for genes for common epilepsy? A critique and a prescription.

Epilepsia. 2012 Sep;53 Suppl 4(0 4):72-80. doi: 10.1111/j.1528-1167.2012.03616.x.

Finding disease genes: a fast and flexible approach for analyzing high-throughput data.

Eur J Hum Genet. 2011 Oct;19(10):1090-4. doi: 10.1038/ejhg.2011.81. Epub 2011 May 25.

本文引用的文献

Novel loci interacting epistatically with bone morphogenetic protein receptor 2 cause familial pulmonary arterial hypertension.

J Heart Lung Transplant. 2010 Feb;29(2):174-80. doi: 10.1016/j.healun.2009.08.022. Epub 2009 Oct 28.

Handling linkage disequilibrium in linkage analysis using dense single-nucleotide polymorphisms.

BMC Proc. 2007;1 Suppl 1(Suppl 1):S161. doi: 10.1186/1753-6561-1-s1-s161. Epub 2007 Dec 18.

Ignoring intermarker linkage disequilibrium induces false-positive evidence of linkage for consanguineous pedigrees when genotype data is missing for any pedigree member.

Hum Hered. 2008;65(4):199-208. doi: 10.1159/000112367. Epub 2007 Dec 11.

Evaluating candidate genes in common epilepsies and the nature of evidence.

Epilepsia. 2008 Mar;49(3):386-92. doi: 10.1111/j.1528-1167.2007.01416.x. Epub 2007 Nov 19.

Improving estimates of genetic maps: a meta-analysis-based approach.

Genet Epidemiol. 2007 Jul;31(5):408-16. doi: 10.1002/gepi.20221.

Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes.

Nat Genet. 2006 Mar;38(3):320-3. doi: 10.1038/ng1732. Epub 2006 Jan 15.

Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers.

Am J Hum Genet. 2005 Nov;77(5):754-67. doi: 10.1086/497345. Epub 2005 Sep 20.

Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis.

Am J Hum Genet. 2004 Dec;75(6):1106-12. doi: 10.1086/426000. Epub 2004 Oct 18.

Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps.

Am J Hum Genet. 2004 Oct;75(4):687-92. doi: 10.1086/424696. Epub 2004 Aug 13.

Genetic heritage of the Old Order Mennonites of southeastern Pennsylvania.

Am J Med Genet C Semin Med Genet. 2003 Aug 15;121C(1):18-31. doi: 10.1002/ajmg.c.20003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对密集SNP数据进行多次子采样可提高疾病基因定位的精度。

Multiple subsampling of dense SNP data localizes disease genes with increased precision.

作者信息

机构信息

出版信息

METHODS

RESULTS

CONCLUSION

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献