人类全基因组多态性研究中的确定偏倚。

Ascertainment bias in studies of human genome-wide polymorphism.

作者信息

Clark Andrew G, Hubisz Melissa J, Bustamante Carlos D, Williamson Scott H, Nielsen Rasmus

机构信息

Molecular Biology and Genetics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.

出版信息

Genome Res. 2005 Nov;15(11):1496-502. doi: 10.1101/gr.4107905.

DOI:10.1101/gr.4107905

PMID:16251459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1310637/

Abstract

Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and F(ST), as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.

摘要

大规模单核苷酸多态性（SNP）基因分型研究依赖于对核苷酸变异的初步评估，以识别DNA序列中个体间存在变异的位点。这个“单核苷酸多态性发现”样本在大小和组成上可能有很大差异，并且已经明确发现的单核苷酸多态性的特性会受到发现抽样工作的影响。国际人类基因组单体型图计划（International HapMap project）依靠几乎所有可用信息来识别单核苷酸多态性，包括细菌人工染色体（BAC）末端序列、鸟枪法测序读数以及公共和私有序列之间的差异，甚至利用黑猩猩数据来确认人类序列差异。此外，确定标准从仅使用在群体样本中已验证的单核苷酸多态性，转变为双击中的单核苷酸多态性，最终接受在小发现样本中为单例的单核苷酸多态性。相比之下，Perlegen的主要发现是利用多态性发现资源中24个不同来源的人进行杂交重测序工作。在这里，我们采用这两个数据集，对比两个基本的汇总统计量，杂合度和F(ST)，以及跨越基因组的500千碱基窗口的位点频率谱。这些样本在这些变异性度量上的差异程度表明对原始基因型数据进行群体遗传学分析是不明智的。鉴于对发现样本的了解，我们进行了确定校正，并展示了校正后的数据在这些研究中如何更一致。然而，差异仍然存在，这表明人类基因组单体型图计划单核苷酸多态性发现过程中的异质性导致了一个难以完全进行确定校正的数据集。确定偏倚可能会削弱单核苷酸多态性与复杂疾病之间关联测试的效力，但影响可能较小，也许更重要的是，这种偏倚不太可能引入假阳性推断。

相似文献

Ascertainment bias in studies of human genome-wide polymorphism.

Genome Res. 2005 Nov;15(11):1496-502. doi: 10.1101/gr.4107905.

How imputation can mitigate SNP ascertainment Bias.

BMC Genomics. 2021 May 12;22(1):340. doi: 10.1186/s12864-021-07663-6.

Ascertainment biases in SNP chips affect measures of population divergence.

Mol Biol Evol. 2010 Nov;27(11):2534-47. doi: 10.1093/molbev/msq148. Epub 2010 Jun 17.

Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.

Genome Res. 2009 Nov;19(11):2154-62. doi: 10.1101/gr.095000.109. Epub 2009 Aug 21.

How do SNP ascertainment schemes and population demographics affect inferences about population history?

BMC Genomics. 2015 Apr 3;16(1):266. doi: 10.1186/s12864-015-1469-5.

Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.

BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.

How well do HapMap haplotypes identify common haplotypes of genes? A comparison with haplotypes of 334 genes resequenced in the environmental genome project.

Cancer Epidemiol Biomarkers Prev. 2006 Jan;15(1):133-7. doi: 10.1158/1055-9965.EPI-05-0641.

Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias.

BMC Genomics. 2012 Jan 19;13:34. doi: 10.1186/1471-2164-13-34.

SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it.

Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9.

SNP mining in C. clementina BAC end sequences; transferability in the Citrus genus (Rutaceae), phylogenetic inferences and perspectives for genetic mapping.

BMC Genomics. 2012 Jan 10;13:13. doi: 10.1186/1471-2164-13-13.

引用本文的文献

Characterizing selection on complex traits through conditional frequency spectra.

Genetics. 2025 Apr 17;229(4). doi: 10.1093/genetics/iyae210.

A Variant-Centric Analysis of Allele Sharing in Dogs and Wolves.

Genes (Basel). 2024 Sep 5;15(9):1168. doi: 10.3390/genes15091168.

Conditional frequency spectra as a tool for studying selection on complex traits in biobanks.

bioRxiv. 2024 Jun 17:2024.06.15.599126. doi: 10.1101/2024.06.15.599126.

Identification of a Novel Homozygous Mutation in Gene Causes Very Rare Charcot-Marie-Tooth Disease Type 4B1.

Appl Clin Genet. 2024 May 31;17:71-84. doi: 10.2147/TACG.S448084. eCollection 2024.

Global musical diversity is largely independent of linguistic and genetic histories.

Nat Commun. 2024 May 10;15(1):3964. doi: 10.1038/s41467-024-48113-7.

Survival analysis under imperfect record linkage using historic census data.

BMC Med Res Methodol. 2024 Mar 13;24(1):67. doi: 10.1186/s12874-024-02194-6.

Identification and functional characterization of a novel heterozygous splice‑site mutation in the calpain 3 gene causes rare autosomal dominant limb‑girdle muscular dystrophy.

Exp Ther Med. 2024 Jan 11;27(3):97. doi: 10.3892/etm.2024.12385. eCollection 2024 Mar.

Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes.

PLoS Genet. 2023 Sep 7;19(9):e1010931. doi: 10.1371/journal.pgen.1010931. eCollection 2023 Sep.

Demographic history of Ryukyu islanders at the southern part of the Japanese Archipelago inferred from whole-genome resequencing data.

J Hum Genet. 2023 Nov;68(11):759-767. doi: 10.1038/s10038-023-01180-y. Epub 2023 Jul 20.

Genotyping of DNA pools identifies untapped landraces and genomic regions to develop next-generation varieties.

Plant Biotechnol J. 2023 Jun;21(6):1123-1139. doi: 10.1111/pbi.14022. Epub 2023 Apr 13.

本文引用的文献

Natural selection on protein-coding genes in the human genome.

Nature. 2005 Oct 20;437(7062):1153-7. doi: 10.1038/nature04240.

The pattern of polymorphism in Arabidopsis thaliana.

PLoS Biol. 2005 Jul;3(7):e196. doi: 10.1371/journal.pbio.0030196. Epub 2005 May 24.

Simultaneous inference of selection and population growth from patterns of variation in the human genome.

Proc Natl Acad Sci U S A. 2005 May 31;102(22):7882-7. doi: 10.1073/pnas.0502300102. Epub 2005 May 19.

Whole-genome patterns of common DNA variation in three human populations.

Science. 2005 Feb 18;307(5712):1072-9. doi: 10.1126/science.1105436.

Population genetic analysis of ascertained SNP data.

Hum Genomics. 2004 Mar;1(3):218-24. doi: 10.1186/1479-7364-1-3-218.

Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data.

Genetics. 2004 Dec;168(4):2373-82. doi: 10.1534/genetics.104.031039. Epub 2004 Sep 15.

Pattern of sequence variation across 213 environmental response genes.

Genome Res. 2004 Oct;14(10A):1821-31. doi: 10.1101/gr.2730004. Epub 2004 Sep 13.

Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations.

Am J Hum Genet. 2004 Apr;74(4):610-22. doi: 10.1086/382227. Epub 2004 Mar 10.

The International HapMap Project.

Nature. 2003 Dec 18;426(6968):789-96. doi: 10.1038/nature02168.

Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium.

Theor Popul Biol. 2003 May;63(3):245-55. doi: 10.1016/s0040-5809(03)00005-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人类全基因组多态性研究中的确定偏倚。

Ascertainment bias in studies of human genome-wide polymorphism.

作者信息

Clark Andrew G, Hubisz Melissa J, Bustamante Carlos D, Williamson Scott H, Nielsen Rasmus

机构信息

Molecular Biology and Genetics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.

出版信息

Genome Res. 2005 Nov;15(11):1496-502. doi: 10.1101/gr.4107905.

DOI:10.1101/gr.4107905

PMID:16251459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1310637/

Abstract

摘要

人类全基因组多态性研究中的确定偏倚。

Ascertainment bias in studies of human genome-wide polymorphism.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

人类全基因组多态性研究中的确定偏倚。

Ascertainment bias in studies of human genome-wide polymorphism.

作者信息

机构信息

出版信息