Wilkening Stefan, Chen Bowang, Wirtenberger Michael, Burwinkel Barbara, Försti Asta, Hemminki Kari, Canzian Federico
Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.
BMC Genomics. 2007 Mar 16;8:77. doi: 10.1186/1471-2164-8-77.
Genotyping technologies for whole genome association studies are now available. To perform such studies to an affordable price, pooled DNA can be used. Recent studies have shown that GeneChip Human Mapping 10 K and 50 K arrays are suitable for the estimation of the allele frequency in pooled DNA. In the present study, we tested the accuracy of the 250 K Nsp array, which is part of the 500 K array set representing 500,568 SNPs. Furthermore, we compared different algorithms to estimate allele frequencies of pooled DNA.
We could confirm that the polynomial based probe specific correction (PPC) was the most accurate method for allele frequency estimation. However, a simple k-correction, using the relative allele signal (RAS) of heterozygous individuals, performed only slightly worse and provided results for more SNPs. Using four replicates of the 250 K array and the k-correction using heterozygous RAS values, we obtained results for 104.141 SNPs. The correlation between estimated and real allele frequency was 0.983 and the average error was 0.046, which was comparable to the results obtained with the 10 K array. Furthermore, we could show how the estimation accuracy depended on the SNP type (average error for A/T SNPs: 0.043 and for G/C SNPs: 0.052).
The combination of DNA pooling and analysis of single nucleotide polymorphisms (SNPs) on high density microarrays is a promising tool for whole genome association studies.
全基因组关联研究的基因分型技术现已可用。为了以可承受的价格进行此类研究,可以使用混合DNA。最近的研究表明,基因芯片人类图谱10K和50K阵列适用于估计混合DNA中的等位基因频率。在本研究中,我们测试了250K Nsp阵列的准确性,该阵列是代表500,568个单核苷酸多态性(SNP)的500K阵列集的一部分。此外,我们比较了不同的算法来估计混合DNA的等位基因频率。
我们可以确认基于多项式的探针特异性校正(PPC)是等位基因频率估计中最准确的方法。然而,使用杂合个体的相对等位基因信号(RAS)进行的简单k校正表现略差,但能为更多的SNP提供结果。使用250K阵列的四个复制品以及使用杂合RAS值的k校正,我们获得了104,141个SNP的结果。估计的等位基因频率与实际等位基因频率之间的相关性为0.983,平均误差为0.046,这与使用10K阵列获得的结果相当。此外,我们可以展示估计准确性如何取决于SNP类型(A/T SNP的平均误差:0.043,G/C SNP的平均误差:0.052)。
DNA混合与高密度微阵列上的单核苷酸多态性(SNP)分析相结合是全基因组关联研究的一个有前途的工具。