Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore.
BMC Genet. 2009 Dec 14;10:82. doi: 10.1186/1471-2156-10-82.
The use of pooled DNA on SNP microarrays (SNP-MaP) has been shown to be a cost effective and rapid manner to perform whole-genome association evaluations. While the accuracy of SNP-MaP was extensively evaluated on the early Affymetrix 10 k and 100 k platforms, there have not been as many similarly comprehensive studies on more recent platforms. In the present study, we used the data generated from the full Affymetrix 500 k SNP set together with the polynomial-based probe-specific correction (PPC) to derive allele frequency estimates. These estimates were compared to genotyping results of the same individuals on the same platform, as the basis to evaluate the reliability and accuracy of pooled genotyping on these high-throughput platforms. We subsequently extended this comparison to the new SNP6.0 platform capable of genotyping 1.8 million genetic variants.
We showed that pooled genotyping on the 500 k platform performed as well as those previously shown on the relatively lower throughput 10 k and 100 k array sets, with high levels of accuracy (correlation coefficient: 0.988) and low median error (0.036) in allele frequency estimates. Similar results were also obtained from the SNP6.0 array set. A novel pooling strategy of overlapping sub-pools was attempted and comparison of estimated allele frequencies showed this strategy to be as reliable as replicate pools. The importance of an appropriate reference genotyping data set for the application of the PPC algorithm was also evaluated; reference samples with similar ethnic background to the pooled samples were found to improve estimation of allele frequencies.
We conclude that use of the PPC algorithm to estimate allele frequencies obtained from pooled genotyping on the high throughput 500 k and SNP6.0 platforms is highly accurate and reproducible especially when a suitable reference sample set is used to estimate the beta values for PPC.
使用 SNP 微阵列(SNP-MaP)上的 pooled DNA 已被证明是一种经济有效的快速方法,可以进行全基因组关联评估。虽然 SNP-MaP 的准确性在早期的 Affymetrix 10 k 和 100 k 平台上得到了广泛评估,但在最近的平台上并没有那么多类似的全面研究。在本研究中,我们使用了来自全 Affymetrix 500 k SNP 集的数据,并结合基于多项式的探针特异性校正(PPC)来得出等位基因频率估计值。这些估计值与同一平台上相同个体的基因分型结果进行比较,作为评估这些高通量平台上 pooled 基因分型可靠性和准确性的基础。随后,我们将这种比较扩展到了能够对 180 万个遗传变体进行基因分型的新 SNP6.0 平台。
我们表明,在 500 k 平台上进行 pooled 基因分型的性能与之前在相对较低通量的 10 k 和 100 k 阵列集上显示的性能一样好,等位基因频率估计值具有很高的准确性(相关系数:0.988)和低的中位数误差(0.036)。从 SNP6.0 阵列集也得到了类似的结果。尝试了一种新的重叠子池的 pooling 策略,并且对估计的等位基因频率的比较表明该策略与重复池一样可靠。还评估了适用于 PPC 算法的适当参考基因分型数据集的应用的重要性;发现与 pooled 样本具有相似种族背景的参考样本可以改善等位基因频率的估计。
我们得出结论,使用 PPC 算法来估计高通量 500 k 和 SNP6.0 平台上 pooled 基因分型获得的等位基因频率非常准确且可重复,尤其是当使用合适的参考样本集来估计 PPC 的β值时。