Schnack Hugo G, Bakker Steven C, van 't Slot Ruben, Groot Bart M, Sinke Richard J, Kahn Rene S, Pearson Peter L
Department of Psychiatry, University Medical Center Utrecht, The Netherlands.
Eur J Hum Genet. 2004 Nov;12(11):925-34. doi: 10.1038/sj.ejhg.5201234.
Pooling of DNA samples instead of individual genotyping can speed up genetic association studies. However, for microsatellite markers, the electrophoretic pattern of DNA pools can be complex, and procedures for deriving allele frequencies are often confounded by PCR-induced stutter artefacts. We have developed a mathematical procedure to remove stutter noise and accurately determine allele frequencies in pools. A stutter correction model can be reliably derived from one standard 'training set' of the same 10 individual DNA samples for each marker, which can also include heterozygous patterns with partially overlapping peaks. Compared with earlier methods, this reduces the number of genotypes needed in the training set considerably, and allows standardization of analyses for different markers. Moreover, the use of a procedure that fits all data simultaneously makes the method less sensitive to aberrant data. The model was tested with 34 markers, 18 of which were newly defined from human sequence data. Allele frequencies derived from stutter-corrected DNA pool patterns were compared with the summed individual genotyping results of all the individuals in the pools (n = 109 and n = 64). We show that the model is robust and accurately extracts allele frequencies from pooled DNA samples for 32 of the 34 microsatellite markers tested. Finally, we performed a case-control study in celiac disease and found that weakly associated disease alleles, identified by individual genotyping, were only detectable in pools after stutter correction. This efficient method for correcting stutter artefacts in microsatellite markers enables large-scale genetic association studies using DNA pools to be performed.
将DNA样本混合而非进行个体基因分型能够加快基因关联研究。然而,对于微卫星标记而言,DNA混合样本的电泳图谱可能很复杂,并且用于推导等位基因频率的程序常常会因PCR诱导的拖尾假象而混淆。我们开发了一种数学程序来去除拖尾噪声并准确确定混合样本中的等位基因频率。对于每个标记,可从相同的10个个体DNA样本的一个标准“训练集”可靠地推导出拖尾校正模型,该训练集还可包括具有部分重叠峰的杂合模式。与早期方法相比,这大大减少了训练集中所需的基因型数量,并允许对不同标记的分析进行标准化。此外,使用同时拟合所有数据的程序使该方法对异常数据的敏感性降低。该模型用34个标记进行了测试,其中18个是从人类序列数据中新定义的。将从拖尾校正后的DNA混合样本模式中得出的等位基因频率与混合样本中所有个体(n = 109和n = 64)的个体基因分型结果总和进行比较。我们表明,对于所测试的34个微卫星标记中的32个,该模型稳健且能从混合DNA样本中准确提取等位基因频率。最后,我们对乳糜泻进行了病例对照研究,发现通过个体基因分型鉴定出的弱关联疾病等位基因,仅在拖尾校正后的混合样本中可检测到。这种校正微卫星标记中拖尾假象的有效方法使得能够使用DNA混合样本进行大规模基因关联研究。