Bass M P, Martin E R, Hauser E R
Department of Medicine, Center for Human Genetics, 595 LaSalle St., Box 3445, Duke University Medical Center, Durham, NC 27710, USA.
Pac Symp Biocomput. 2004:93-103. doi: 10.1142/9789812704856_0010.
We have developed a software package, SIMLA (simulation of linkage and association), which can be used to generate pedigree data under user-specified conditions. The number and location of disease loci, disease penetrances, marker locations, and marker disequilibrium with a disease locus and with other markers can be controlled. In addition, the pedigree size and availability of genotype data may also be specified, and a number of rules for family ascertainment are available. Estimates for power and type I errors can be evaluated under a variety of conditions, as needed by the user. We developed this simulation program because there are no publicly available programs to simulate variable levels of both recombination and linkage disequilibrium (LD) in general pedigrees. Genetic researchers are routinely applying both tests of linkage and family-based tests of association in the search for complex disease genes, and a plethora of different statistical approaches are available. Thus there is a need for the flexible statistical simulation program that we describe. This is the only program that we are aware of that allows simulation of linkage and association for multiple markers in extended pedigrees, nuclear families or in sets of unrelated cases and controls. Furthermore, the program not only allows for variable levels of LD among markers but also between markers and disease loci. SIMLA can simulate the complex and variable levels of LD that have been observed at close markers across the genome and allows for realistic simulation of complex relationships between markers. The program will be useful for studying and comparing existing statistical tests, for developing new genetic linkage and association statistics, planning sample sizes for new studies, and interpreting genetic analysis results.
我们开发了一个软件包SIMLA(连锁与关联模拟),可用于在用户指定的条件下生成家系数据。疾病位点的数量和位置、疾病外显率、标记位置以及标记与疾病位点和其他标记之间的连锁不平衡均可控制。此外,还可以指定家系大小和基因型数据的可用性,并且有多种家系确定规则可供选择。根据用户的需要,可以在各种条件下评估检验效能和I型错误的估计值。我们开发这个模拟程序是因为目前没有公开可用的程序来模拟一般家系中不同水平的重组和连锁不平衡(LD)。遗传研究人员在寻找复杂疾病基因时经常应用连锁检验和基于家系的关联检验,并且有大量不同的统计方法可供使用。因此,需要我们所描述的这种灵活的统计模拟程序。这是我们所知的唯一一款能够模拟扩展家系、核心家庭或无关病例与对照集中多个标记的连锁与关联的程序。此外,该程序不仅允许标记之间存在不同水平的LD,还允许标记与疾病位点之间存在不同水平的LD。SIMLA可以模拟在全基因组紧密标记处观察到的复杂且可变水平的LD,并能够对标记之间的复杂关系进行逼真的模拟。该程序将有助于研究和比较现有的统计检验、开发新的遗传连锁和关联统计方法、规划新研究的样本量以及解释遗传分析结果。