Thomas Alun, Camp Nicola J
Department of Medical Informatics, University of Utah, Salt Lake City, UT 84108, USA.
Am J Hum Genet. 2004 Jun;74(6):1088-101. doi: 10.1086/421249. Epub 2004 Apr 26.
Pairwise linkage disequilibrium, haplotype blocks, and recombination hotspots provide only a partial description of the patterns of dependences and independences between the allelic states at proximal loci. On the gross scale, where recombination and spatial relationships dominate, the associations can be reasonably described in these terms. However, on the fine scale of current high-density maps, the mutation process is also important and creates associations between loci that are independent of the physical ordering and that can not be summarized with pairwise measures of association. Graphical modeling provides a standard statistical framework for characterizing precisely these sorts of complex stochastic data. Although graphical models are often used in situations in which assumptions lead naturally to specific models, it is less well known that estimation of graphical models is also a developed field. We show how decomposable graphical models can be fitted to dense genetic data. The objective function is the maximized log likelihood for the model penalized by a multiple of the model's degrees of freedom. We also describe how this can be modified to incorporate prior information of locus position. Simulated annealing is used to find good solutions. Part of the appeal of this approach is that categorical phenotypes can be included in the same analysis and association with polymorphisms can be assessed jointly with the interlocus associations. We illustrate our method with genotypic data from 25 loci in the ELAC2 gene. The results contain third- and fourth-order locus interactions and show that, at this density of markers, linkage disequilibrium is not a simple function of physical distance. Graphical models provide more flexibility to express these features of the joint distribution of alleles than do monotonic functions connecting physical and genetic maps.
成对连锁不平衡、单倍型块和重组热点仅部分描述了近端位点等位基因状态之间的依赖和独立模式。在宏观尺度上,重组和空间关系占主导地位,这些关联可以用这些术语合理描述。然而,在当前高密度图谱的精细尺度上,突变过程也很重要,它会在与物理排序无关且无法用成对关联度量进行总结的位点之间产生关联。图形建模提供了一个标准的统计框架,用于精确表征这类复杂的随机数据。尽管图形模型常用于假设自然导致特定模型的情况,但鲜为人知的是,图形模型的估计也是一个成熟的领域。我们展示了如何将可分解图形模型拟合到密集的遗传数据中。目标函数是模型的对数似然最大化,并由模型自由度的倍数进行惩罚。我们还描述了如何对其进行修改以纳入位点位置的先验信息。使用模拟退火来找到好的解决方案。这种方法的部分吸引力在于,可以在同一分析中纳入分类表型,并可以与位点间关联一起评估与多态性的关联。我们用ELAC2基因中25个位点的基因型数据说明了我们的方法。结果包含三阶和四阶位点相互作用,并表明,在这种标记密度下,连锁不平衡不是物理距离的简单函数。与连接物理图谱和遗传图谱的单调函数相比,图形模型在表达等位基因联合分布的这些特征方面提供了更大的灵活性。