从存于dbSNP的联合基因分型研究中对单倍型进行推断和分析。

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

作者信息

Zaitlen Noah A, Kang Hyun Min, Feolo Michael L, Sherry Stephen T, Halperin Eran, Eskin Eleazar

机构信息

Bioinformatics Program, University of California, San Diego, La Jolla, California 92093, USA.

出版信息

Genome Res. 2005 Nov;15(11):1594-600. doi: 10.1101/gr.4297805.

DOI:10.1101/gr.4297805

PMID:16251470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1310648/

Abstract

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.

摘要

为了理解人类变异以及复杂疾病的遗传基础，人们发现了大量的单核苷酸多态性（SNP），并将其存入美国国立生物技术信息中心（NCBI）的dbSNP公共数据库。该数据库中有超过270万个SNP拥有基因型信息。这些数据为理解人类变异的结构以及遗传关联研究的设计提供了极为宝贵的资源。存入dbSNP的基因型是未分型的，因此单倍型信息未知。我们应用分型方法HAP来获取所有公开可用基因型数据的单倍型信息、区域划分和标签SNP，并将这些信息存入dbSNP数据库。我们还存入了使用人类和黑猩猩的加州大学圣克鲁兹分校（UCSC）BLASTZ比对计算出的每个预测单倍型区域的直系黑猩猩参考序列。通过dbSNP，研究人员现在可以轻松地使用来自同一基因组区域的多个基因型数据集进行分析。来自同一区域的密集和稀疏基因型数据集被合并起来，结果表明在全基因组数据集中常见单倍型的数量被显著低估，而不同研究中常见SNP上的预测单倍型是一致的。为了验证预测的准确性，我们将HAP的运行时间和分型准确性与PHASE进行了基准测试。尽管HAP的准确性略低于PHASE，但HAP的速度比PHASE快1000多倍，这使得它适用于dbSNP中的所有基因型。

相似文献

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

Genome Res. 2005 Nov;15(11):1594-600. doi: 10.1101/gr.4297805.

Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies.

Hum Mutat. 2010 Jan;31(1):67-73. doi: 10.1002/humu.21137.

Haplotype reconstruction from genotype data using Imperfect Phylogeny.

Bioinformatics. 2004 Aug 12;20(12):1842-9. doi: 10.1093/bioinformatics/bth149. Epub 2004 Feb 26.

Tag SNP selection in genotype data for maximizing SNP prediction accuracy.

Bioinformatics. 2005 Jun;21 Suppl 1:i195-203. doi: 10.1093/bioinformatics/bti1021.

The diploid genome sequence of an Asian individual.

Nature. 2008 Nov 6;456(7218):60-5. doi: 10.1038/nature07484.

Finding haplotype tagging SNPs by use of principal components analysis.

Am J Hum Genet. 2004 Nov;75(5):850-61. doi: 10.1086/425587. Epub 2004 Sep 23.

A note on phasing long genomic regions using local haplotype predictions.

J Bioinform Comput Biol. 2006 Jun;4(3):639-47. doi: 10.1142/s0219720006002272.

Highly scalable genotype phasing by entropy minimization.

IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):252-61. doi: 10.1109/TCBB.2007.70223.

Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes.

Genet Epidemiol. 2005 Sep;29(2):91-107. doi: 10.1002/gepi.20080.

Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphisms.

Biostatistics. 2003 Oct;4(4):513-22. doi: 10.1093/biostatistics/4.4.513.

引用本文的文献

Deciphering of Genomic Loci Associated with Alkaline Tolerance in Soybean [ (L.) Merr.] by Genome-Wide Association Study.

Plants (Basel). 2025 Jan 24;14(3):357. doi: 10.3390/plants14030357.

Identification of superior haplotypes and candidate gene for seed size-related traits in soybean ( L.).

Mol Breed. 2024 Dec 22;45(1):3. doi: 10.1007/s11032-024-01525-1. eCollection 2025 Jan.

Local haplotyping reveals insights into the genetic control of flowering time variation in wild and domesticated soybean.

Plant Genome. 2024 Dec;17(4):e20528. doi: 10.1002/tpg2.20528. Epub 2024 Nov 7.

Identification of superior and rare haplotypes to optimize branch number in soybean.

Theor Appl Genet. 2024 Apr 3;137(4):93. doi: 10.1007/s00122-024-04596-y.

Genome-wide survey identified superior and rare haplotypes for plant height in the north-eastern soybean germplasm of China.

Mol Breed. 2023 Mar 20;43(4):22. doi: 10.1007/s11032-023-01363-7. eCollection 2023 Apr.

Genome-wide association study, haplotype analysis, and genomic prediction reveal the genetic basis of yield-related traits in soybean ( L.).

Front Genet. 2022 Aug 17;13:953833. doi: 10.3389/fgene.2022.953833. eCollection 2022.

Identification of superior haplotypes in a diverse natural population for breeding desirable plant height in soybean.

Theor Appl Genet. 2022 Jul;135(7):2407-2422. doi: 10.1007/s00122-022-04120-0. Epub 2022 May 31.

Superior haplotypes for haplotype-based breeding for drought tolerance in pigeonpea (Cajanus cajan L.).

Plant Biotechnol J. 2020 Dec;18(12):2482-2490. doi: 10.1111/pbi.13422. Epub 2020 Jun 22.

Synonymous polymorphisms at splicing regulatory sites are associated with CpGs in neurodegenerative disease-related genes.

Neuromolecular Med. 2010 Sep;12(3):260-9. doi: 10.1007/s12017-009-8111-0. Epub 2010 Jan 14.

Neuropeptide Y(1) Receptor NPY1R discovery of naturally occurring human genetic variants governing gene expression in cella as well as pleiotropic effects on autonomic activity and blood pressure in vivo.

J Am Coll Cardiol. 2009 Sep 1;54(10):944-54. doi: 10.1016/j.jacc.2009.05.035.

本文引用的文献

Whole-genome patterns of common DNA variation in three human populations.

Science. 2005 Feb 18;307(5712):1072-9. doi: 10.1126/science.1105436.

Pattern of sequence variation across 213 environmental response genes.

Genome Res. 2004 Oct;14(10A):1821-31. doi: 10.1101/gr.2730004. Epub 2004 Sep 13.

Efficient reconstruction of haplotype structure via perfect phylogeny.

J Bioinform Comput Biol. 2003 Apr;1(1):1-20. doi: 10.1142/s0219720003000174.

Mapping complex disease loci in whole-genome association studies.

Nature. 2004 May 27;429(6990):446-52. doi: 10.1038/nature02623.

Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations.

Am J Hum Genet. 2004 Apr;74(4):610-22. doi: 10.1086/382227. Epub 2004 Mar 10.

Haplotype reconstruction from genotype data using Imperfect Phylogeny.

Bioinformatics. 2004 Aug 12;20(12):1842-9. doi: 10.1093/bioinformatics/bth149. Epub 2004 Feb 26.

The International HapMap Project.

Nature. 2003 Dec 18;426(6968):789-96. doi: 10.1038/nature02168.

Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population.

Genome Res. 2003 Oct;13(10):2271-6. doi: 10.1101/gr.1299703.

Large-scale genotyping of complex DNA.

Nat Biotechnol. 2003 Oct;21(10):1233-7. doi: 10.1038/nbt869. Epub 2003 Sep 7.

A dynamic programming algorithm for haplotype block partitioning.

Proc Natl Acad Sci U S A. 2002 May 28;99(11):7335-9. doi: 10.1073/pnas.102186799.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从存于dbSNP的联合基因分型研究中对单倍型进行推断和分析。

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

作者信息

Zaitlen Noah A, Kang Hyun Min, Feolo Michael L, Sherry Stephen T, Halperin Eran, Eskin Eleazar

机构信息

Bioinformatics Program, University of California, San Diego, La Jolla, California 92093, USA.

出版信息

Genome Res. 2005 Nov;15(11):1594-600. doi: 10.1101/gr.4297805.

DOI:10.1101/gr.4297805

PMID:16251470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1310648/

Abstract

摘要

从存于dbSNP的联合基因分型研究中对单倍型进行推断和分析。

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

从存于dbSNP的联合基因分型研究中对单倍型进行推断和分析。

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP.

作者信息

机构信息

出版信息