Department of Genetics, University of North Carolina at Chapel Hill, 27599-7264, USA.
BMC Genomics. 2012 Jan 19;13:34. doi: 10.1186/1471-2164-13-34.
High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs.
We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains.
The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as limiting factors in genetic studies. We have conducted the first formal analysis of the effect of novel variants on genotyping arrays, and we have shown that these variants account for a large portion of miscalled and uncalled genotypes. Genetic studies will benefit from substantial improvements in the accuracy of their results by incorporating VINOs in their analyses.
高密度基因分型芯片通过测量基因组 DNA 片段与等位基因特异性寡核苷酸探针的杂交来广泛用于遗传研究中的单核苷酸多态性(SNP)基因分型,包括人类全基因组关联研究。通过聚类算法将杂交强度转换为基因型,聚类算法将每个样本分配到每个 SNP 的基因型类别。不符合聚类预期模式的 SNP 探针数据通常会被丢弃,导致确定偏差,并导致信息丢失 - 在最近的犬全基因组关联研究中高达 50%。
我们确定了高度可重复的杂交强度异常模式,并证明这些模式代表了在芯片平台设计中未考虑到的遗传变异。我们描述了显示这种模式的可变强度寡核苷酸(VINO)探针,并发现它们存在于所有基于杂交的基因分型平台中,包括为人类、狗、牛和老鼠开发的平台。当被识别并正确解释时,VINOs 恢复了大量被丢弃的探针,并抵消了 SNP 确定偏差。我们开发了一种软件(MouseDivGeno),用于识别 VINOs 并提高基因型调用的准确性。与其他方法相比,MouseDivGeno 产生了高度一致的基因型调用,但它在 351 个老鼠样本中唯一识别了超过 786000 个 VINOs。我们使用来自 14 个老鼠品系的全基因组序列来确认这些品系中存在解释 28000 个 VINOs 的新型变体。我们还在人类 HapMap 3 样本中发现了 VINOs,其中许多是特定于非洲人群的。将 VINOs 纳入系统发育分析极大地提高了 Mus 物种树的准确性,并改进了实验室老鼠品系中的局部单倍型分配。
由于基因分型错误导致的确定偏差和信息缺失问题已被广泛认为是遗传研究的限制因素。我们对新型变体对基因分型芯片的影响进行了首次正式分析,并表明这些变体占误报和未报基因型的很大一部分。通过将 VINOs 纳入其分析,遗传研究将从其结果准确性的重大提高中受益。