Suppr超能文献

利用局部单倍型聚类对全基因组关联研究进行快速准确的单倍型分型和缺失数据推断。

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

作者信息

Browning Sharon R, Browning Brian L

机构信息

Department of Statistics, The University of Auckland, Auckland, New Zealand.

出版信息

Am J Hum Genet. 2007 Nov;81(5):1084-97. doi: 10.1086/521987. Epub 2007 Sep 21.

Abstract

Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.

摘要

由于全基因组关联研究获取的数据量巨大,因此带来了许多新的统计和计算挑战。其中一个挑战是单倍型推断;为候选基因研究中的小数据集设计的单倍型推断方法,对于全基因组关联研究中大量个体的基因分型数据而言,扩展性不佳。我们提出了一种用于推断单倍型相位和缺失数据的新方法及软件,该方法能够准确地对全基因组关联研究中的数据进行相位分析,并且我们首次对具有数千个基因分型个体的真实和模拟数据集的单倍型推断方法进行了比较。我们发现,对于具有数千个个体且遗传标记密集分布的大数据集,我们的方法在速度和准确性方面均优于现有方法,并且我们使用该方法在3.1天的计算时间内对一个包含3002个个体、490032个标记的真实数据集进行了相位分析,其中99%的缺失等位基因被正确估算。我们的方法在Beagle软件包中实现,该软件包可免费获取。

相似文献

3
A haplotype inference algorithm for trios based on deterministic sampling.
BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.
4
Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.
Mol Biol Evol. 2021 May 4;38(5):2131-2151. doi: 10.1093/molbev/msaa328.
5
HapBoost: a fast approach to boosting haplotype association analyses in genome-wide association studies.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jan-Feb;10(1):207-12. doi: 10.1109/TCBB.2013.6.
6
A fast algorithm for genome-wide haplotype pattern mining.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S74. doi: 10.1186/1471-2105-10-S1-S74.
7
Missing data imputation and haplotype phase inference for genome-wide association studies.
Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11.
9
HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.
J Comput Biol. 2012 Jun;19(6):577-90. doi: 10.1089/cmb.2012.0084.

引用本文的文献

1
Whole-genome sequencing of Tahe red deer () reveals genetic diversity and selection signatures.
Front Vet Sci. 2025 Aug 21;12:1642382. doi: 10.3389/fvets.2025.1642382. eCollection 2025.
2
Ancient DNA connects large-scale migration with the spread of Slavs.
Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09437-6.
3
Genomic Prediction for Growth-Related Traits in Golden Pompano ().
Evol Appl. 2025 Aug 26;18(8):e70147. doi: 10.1111/eva.70147. eCollection 2025 Aug.
5
Improving genomic prediction in pigs by integrating multi-population data and prior knowledge.
BMC Genomics. 2025 Aug 27;26(1):779. doi: 10.1186/s12864-025-12011-z.
8
Beyond the genome: the role of functional markers in contemporary plant breeding.
Front Plant Sci. 2025 Aug 5;16:1637299. doi: 10.3389/fpls.2025.1637299. eCollection 2025.

本文引用的文献

2
A method to address differential bias in genotyping in large-scale association studies.
PLoS Genet. 2007 May 18;3(5):e74. doi: 10.1371/journal.pgen.0030074. Epub 2007 Apr 5.
3
The use of inferred haplotypes in downstream analyses.
Am J Hum Genet. 2007 Mar;80(3):577-9. doi: 10.1086/512201.
5
HaploRec: efficient and accurate large-scale reconstruction of haplotypes.
BMC Bioinformatics. 2006 Dec 22;7:542. doi: 10.1186/1471-2105-7-542.
6
Evaluating coverage of genome-wide association studies.
Nat Genet. 2006 Jun;38(6):659-62. doi: 10.1038/ng1801. Epub 2006 May 21.
7
Multilocus association mapping using variable-length Markov chains.
Am J Hum Genet. 2006 Jun;78(6):903-13. doi: 10.1086/503876. Epub 2006 Apr 7.
9
A comparison of phasing algorithms for trios and unrelated individuals.
Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.
10
2SNP: scalable phasing based on 2-SNP haplotypes.
Bioinformatics. 2006 Feb 1;22(3):371-3. doi: 10.1093/bioinformatics/bti785. Epub 2005 Nov 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验