Lemire Mathieu
McGill University and Genome Quebec Innovation Centre, Montreal, Canada.
BMC Genet. 2006 Jul 3;7:40. doi: 10.1186/1471-2156-7-40.
With the recent advances in high-throughput genotyping technologies that allow for large-scale association mapping of human complex traits, promising statistical designs and methods have been emerging. Efficient simulation software are key elements for the evaluation of the properties of new statistical tests. SLINK is a flexible simulation tool that has been widely used to generate the segregation and recombination processes of markers linked to, and possibly associated with, a trait locus, conditional on trait values in arbitrary pedigrees. In practice, its most serious limitation is the small number of loci that can be simulated, since the complexity of the algorithm scales exponentially with this number.
I describe the implementation of a two-step algorithm to be used in conjunction with SLINK to enable the simulation of a large number of marker loci linked to a trait locus and conditional on trait values in families, with the possibility for the loci to be in linkage disequilibrium. SLINK is used in the first step to simulate genotypes at the trait locus conditional on the observed trait values, and also to generate an indicator of the descent path of the simulated alleles. In the second step, marker alleles or haplotypes are generated in the founders, conditional on the trait locus genotypes simulated in the first step. Then the recombination process between the marker loci takes place conditionally on the descent path and on the trait locus genotypes. This two-step implementation is often computationally faster than other software that are designed to generate marker data linked to, and possibly associated with, a trait locus.
Because the proposed method uses SLINK to simulate the segregation process, it benefits from its flexibility: the trait may be qualitative with the possibility of defining different liability classes (which allows for the simulation of gene-environment interactions or even the simulation of multi-locus effects between unlinked susceptibility regions) or it may be quantitative and normally distributed. In particular, this implementation is the only one available that can generate a large number of marker loci conditional on the set of observed quantitative trait values in pedigrees.
随着高通量基因分型技术的最新进展,其能够对人类复杂性状进行大规模关联图谱分析,一些有前景的统计设计和方法不断涌现。高效的模拟软件是评估新统计检验特性的关键要素。SLINK是一款灵活的模拟工具,已被广泛用于生成与性状位点连锁且可能相关的标记的分离和重组过程,条件是任意家系中的性状值。在实际应用中,其最严重的局限性是可模拟的位点数量较少,因为算法的复杂度会随着该数量呈指数级增长。
我描述了一种两步算法的实现,该算法可与SLINK结合使用,以实现对与性状位点连锁且条件是家系中性状值的大量标记位点的模拟,这些位点有可能处于连锁不平衡状态。第一步使用SLINK根据观察到的性状值模拟性状位点的基因型,同时生成模拟等位基因的系谱路径指示符。第二步,根据第一步模拟的性状位点基因型,在奠基者中生成标记等位基因或单倍型。然后,标记位点之间的重组过程根据系谱路径和性状位点基因型进行。这种两步实现方式在计算上通常比其他旨在生成与性状位点连锁且可能相关的标记数据的软件更快。
由于所提出的方法使用SLINK来模拟分离过程,它受益于其灵活性:性状可以是定性的,有可能定义不同的易感性类别(这允许模拟基因 - 环境相互作用,甚至模拟不连锁的易感区域之间的多位点效应),或者它可以是定量的且呈正态分布。特别是,这种实现方式是唯一一种能够根据家系中观察到的定量性状值集生成大量标记位点的方法。