Suppr超能文献

利用已知相位的序列数据理解统计单倍型推断的准确性。

Understanding the accuracy of statistical haplotype inference with sequence data of known phase.

作者信息

Andrés Aida M, Clark Andrew G, Shimmin Lawrence, Boerwinkle Eric, Sing Charles F, Hixson James E

机构信息

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA.

出版信息

Genet Epidemiol. 2007 Nov;31(7):659-71. doi: 10.1002/gepi.20185.

Abstract

Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25-50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.

摘要

从无关个体的多位点基因型推断单倍型的统计方法在关联研究和群体遗传学中具有重要应用。了解影响这种推断准确性的因素很重要,但由于已知相位的生物学数据有限,对这些因素的评估受到了限制。我们创建了人类19号染色体单体的杂交细胞系,并在39名非裔美国人(AA)和欧裔美国人(EA)个体中生成了一个48 kb基因组区域的单染色体完整序列。我们利用这些已知相位的基因型和合并模拟来评估几种算法进行统计单倍型重建的准确性。即使对于短至25 - 50 kb的区域,我们的生物学数据中相位推断的准确性也相当低,这表明在分析重建的单倍型时需要谨慎。此外,估计的相位推断置信度的可靠性不够高,无法在后续分析中可靠地纳入位点特异性不确定性信息。我们表明,在某些混合血统(AA和EA群体)的样本中,尽管跨这些异质样本合并存在假设问题,但通过考虑最大的合并样本增加样本量时,可能会获得最准确的单倍型。本文还讨论了提高对重建单倍型置信度的策略以及推断单倍型分析的现实替代方法。

相似文献

1
Understanding the accuracy of statistical haplotype inference with sequence data of known phase.
Genet Epidemiol. 2007 Nov;31(7):659-71. doi: 10.1002/gepi.20185.
2
Risk Haplotypes Uniquely Associated with Radioiodine-Refractory Thyroid Cancer Patients of High African Ancestry.
Thyroid. 2019 Apr;29(4):530-539. doi: 10.1089/thy.2018.0687. Epub 2019 Feb 13.
4
Association between polymorphisms in catechol-O-methyltransferase (COMT) and cocaine-induced paranoia in European-American and African-American populations.
Am J Med Genet B Neuropsychiatr Genet. 2011 Sep;156B(6):651-60. doi: 10.1002/ajmg.b.31205. Epub 2011 Jun 8.
5
Association of specific PTEN/10q haplotypes with endometrial cancer phenotypes in African-American and European American women.
Gynecol Oncol. 2015 Aug;138(2):434-40. doi: 10.1016/j.ygyno.2015.05.024. Epub 2015 May 28.
9
Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.
Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

引用本文的文献

2
The hazards of genotype imputation when mapping disease susceptibility variants.
Genome Biol. 2024 Jan 3;25(1):7. doi: 10.1186/s13059-023-03140-3.
5
Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power.
Nat Genet. 2021 Feb;53(2):195-204. doi: 10.1038/s41588-020-00766-y. Epub 2021 Jan 18.
6
Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies.
Hum Hered. 2019;84(4-5):170-196. doi: 10.1159/000506008. Epub 2020 May 16.
7
A general statistic to test an optimally weighted combination of common and/or rare variants.
Genet Epidemiol. 2019 Dec;43(8):966-979. doi: 10.1002/gepi.22255. Epub 2019 Sep 9.
8
An estimator of first coalescent time reveals selection on young variants and large heterogeneity in rare allele ages among human populations.
PLoS Genet. 2019 Aug 19;15(8):e1008340. doi: 10.1371/journal.pgen.1008340. eCollection 2019 Aug.
9
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
10
Testing an optimally weighted combination of common and/or rare variants with multiple traits.
PLoS One. 2018 Jul 26;13(7):e0201186. doi: 10.1371/journal.pone.0201186. eCollection 2018.

本文引用的文献

2
A comparison of phasing algorithms for trios and unrelated individuals.
Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.
3
A haplotype map of the human genome.
Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.
5
Targeted, haplotype-resolved resequencing of long segments of the human genome.
Genomics. 2005 Dec;86(6):759-66. doi: 10.1016/j.ygeno.2005.08.013. Epub 2005 Oct 24.
6
A fine-scale map of recombination rates and hotspots across the human genome.
Science. 2005 Oct 14;310(5746):321-4. doi: 10.1126/science.1117196.
7
The impact of using related individuals for haplotype reconstruction in population studies.
Genetics. 2005 Nov;171(3):1321-30. doi: 10.1534/genetics.105.042762. Epub 2005 Jun 8.
9
Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.
Am J Hum Genet. 2005 Mar;76(3):449-62. doi: 10.1086/428594. Epub 2005 Jan 31.
10
GERBIL: Genotype resolution and block identification using likelihood.
Proc Natl Acad Sci U S A. 2005 Jan 4;102(1):158-62. doi: 10.1073/pnas.0404730102. Epub 2004 Dec 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验