Macgregor Stuart, Visscher Peter M, Montgomery Grant
Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia.
Nucleic Acids Res. 2006 Apr 20;34(7):e55. doi: 10.1093/nar/gkl136.
Array based DNA pooling techniques facilitate genome-wide scale genotyping of large samples. We describe a structured analysis method for pooled data using internal replication information in large scale genotyping sets. The method takes advantage of information from single nucleotide polymorphisms (SNPs) typed in parallel on a high density array to construct a test statistic with desirable statistical properties. We utilize a general linear model to appropriately account for the structured multiple measurements available with array data. The method does not require the use of additional arrays for the estimation of unequal hybridization rates and hence scales readily to accommodate arrays with several hundred thousand SNPs. Tests for differences between cases and controls can be conducted with very few arrays. We demonstrate the method on 384 endometriosis cases and controls, typed using Affymetrix Genechip(c) HindIII 50 K arrays. For a subset of this data there were accurate measures of hybridization rates available. Assuming equal hybridization rates is shown to have a negligible effect upon the results. With a total of only six arrays, the method extracted one-third of the information (in terms of equivalent sample size) available with individual genotyping (requiring 768 arrays). With 20 arrays (10 for cases, 10 for controls), over half of the information could be extracted from this sample.
基于阵列的DNA混合技术有助于对大样本进行全基因组规模的基因分型。我们描述了一种利用大规模基因分型集中的内部重复信息对混合数据进行结构化分析的方法。该方法利用在高密度阵列上并行分型的单核苷酸多态性(SNP)信息来构建具有理想统计特性的检验统计量。我们使用一般线性模型来适当地考虑阵列数据中可用的结构化多重测量。该方法不需要使用额外的阵列来估计不等杂交率,因此很容易扩展以适应具有几十万SNP的阵列。病例组和对照组之间的差异检验可以用很少的阵列进行。我们在384例子宫内膜异位症病例和对照中使用Affymetrix Genechip(c) HindIII 50 K阵列进行了该方法的验证。对于该数据的一个子集,有可用的杂交率准确测量值。结果表明,假设杂交率相等对结果的影响可以忽略不计。总共仅使用六个阵列,该方法就提取了个体基因分型(需要768个阵列)可用信息的三分之一(就等效样本量而言)。使用20个阵列(10个用于病例组,10个用于对照组),可以从该样本中提取超过一半的信息。