Yu Zhaoxia, Garner Chad, Ziogas Argyrios, Anton-Culver Hoda, Schaid Daniel J
Department of Statistics, University of California, Irvine, CA, USA.
BMC Bioinformatics. 2009 Feb 20;10:63. doi: 10.1186/1471-2105-10-63.
Genome-wide association studies with single nucleotide polymorphisms (SNPs) show great promise to identify genetic determinants of complex human traits. In current analyses, genotype calling and imputation of missing genotypes are usually considered as two separated tasks. The genotypes of SNPs are first determined one at a time from allele signal intensities. Then the missing genotypes, i.e., no-calls caused by not perfectly separated signal clouds, are imputed based on the linkage disequilibrium (LD) between multiple SNPs. Although many statistical methods have been developed to improve either genotype calling or imputation of missing genotypes, treating the two steps independently can lead to loss of genetic information.
We propose a novel genotype calling framework. In this framework, we consider the signal intensities and underlying LD structure of SNPs simultaneously by estimating both cluster parameters and haplotype frequencies. As a result, our new method outperforms some existing algorithms in terms of both call rates and genotyping accuracy. Our studies also suggest that jointly analyzing multiple SNPs in LD provides more accurate estimation of haplotypes than haplotype reconstruction methods that only use called genotypes.
Our study demonstrates that jointly analyzing signal intensities and LD structure of multiple SNPs is a better way to determine genotypes and estimate LD parameters.
全基因组单核苷酸多态性(SNP)关联研究在识别复杂人类性状的遗传决定因素方面显示出巨大潜力。在当前分析中,基因型判定和缺失基因型的填充通常被视为两个独立的任务。SNP的基因型首先根据等位基因信号强度逐一确定。然后,基于多个SNP之间的连锁不平衡(LD)对缺失基因型(即由信号云未完全分离导致的无调用)进行填充。尽管已经开发了许多统计方法来改进基因型判定或缺失基因型的填充,但将这两个步骤独立处理可能会导致遗传信息丢失。
我们提出了一种新颖的基因型判定框架。在此框架中,我们通过估计聚类参数和单倍型频率,同时考虑SNP的信号强度和潜在的LD结构。结果,我们的新方法在调用率和基因分型准确性方面均优于一些现有算法。我们的研究还表明,与仅使用已判定基因型的单倍型重建方法相比,联合分析处于LD中的多个SNP能更准确地估计单倍型。
我们的研究表明,联合分析多个SNP的信号强度和LD结构是确定基因型和估计LD参数的更好方法。