Johnson Toby
School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3JT, UK.
Biostatistics. 2007 Jul;8(3):546-65. doi: 10.1093/biostatistics/kxl028. Epub 2006 Sep 19.
Association mapping studies aim to determine the genetic basis of a trait. A common experimental design uses a sample of unrelated individuals classified into 2 groups, for example cases and controls. If the trait has a complex genetic basis, consisting of many quantitative trait loci (QTLs), each group needs to be large. Each group must be genotyped at marker loci covering the region of interest; for dense coverage of a large candidate region, or a whole-genome scan, the number of markers will be very large. The total amount of genotyping required for such a study is formidable. A laboratory effort efficient technique called DNA pooling could reduce the amount of genotyping required, but the data generated are less informative and require novel methods for efficient analysis. In this paper, a Bayesian statistical analysis of the classic model of McPeek and Strahs is proposed. In contrast to previous work on this model, I assume that data are collected using DNA pooling, so individual genotypes are not directly observed, and also account for experimental errors. A complete analysis can be performed using analytical integration, a propagation algorithm for a hidden Markov model, and quadrature. The method developed here is both statistically and computationally efficient. It allows simultaneous detection and mapping of a QTL, in a large-scale association mapping study, using data from pooled DNA. The method is shown to perform well on data sets simulated under a realistic coalescent-with-recombination model, and is shown to outperform classical single-point methods. The method is illustrated on data consisting of 27 markers in an 880-kb region around the CYP2D6 gene.
关联图谱研究旨在确定性状的遗传基础。一种常见的实验设计是使用一个不相关个体的样本,将其分为两组,例如病例组和对照组。如果该性状具有复杂的遗传基础,由许多数量性状基因座(QTL)组成,那么每组都需要足够大。每组必须在覆盖感兴趣区域的标记位点进行基因分型;对于大的候选区域的密集覆盖,或者全基因组扫描,标记的数量将会非常大。这样一项研究所需的基因分型总量是巨大的。一种名为DNA池化的实验室高效技术可以减少所需的基因分型量,但产生的数据信息较少,需要新的方法进行有效分析。本文提出了对McPeek和Strahs经典模型的贝叶斯统计分析。与之前关于该模型的工作不同,我假设数据是使用DNA池化收集的,因此个体基因型不能直接观察到,并且还考虑了实验误差。可以使用解析积分、隐马尔可夫模型的传播算法和求积法进行完整的分析。这里开发的方法在统计和计算上都是高效的。它允许在大规模关联图谱研究中,使用来自混合DNA的数据同时检测和定位QTL。该方法在基于现实的合并重组模型模拟的数据集上表现良好,并且优于经典的单点方法。该方法在由CYP2D6基因周围880 kb区域内的27个标记组成的数据上进行了说明。