Kaplan N, Morris R
Biostatistics Branch, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709-2233, USA.
Genet Epidemiol. 2001 May;20(4):432-57. doi: 10.1002/gepi.1012.
The usefulness of association studies for fine mapping loci with common susceptibility alleles for complex genetic diseases in outbred populations is unclear. We investigate this issue for a battery of tightly linked anonymous genetic markers spanning a candidate region centered around a disease locus, and study the joint behavior of chi-square statistics used to discover and to localize the disease locus. We used simulation methods based on a coalescent process with mutation, recombination, and genetic drift to examine the spatial distribution of markers with large noncentrality parameters in a case-control study design. Simulations with a disease allele at intermediate frequency, presumably representing an old mutation, tend to exhibit the largest noncentrality parameter values at markers near the disease locus. In contrast, simulations with a disease allele at low frequency, presumably representing a young mutation, often exhibit the largest noncentrality parameter values at markers scattered over the candidate region. In the former cases, sample sizes or marker densities sufficient to detect association are likely to lead to useful localization, whereas, in the latter case, localization of the disease locus within the candidate region is much less likely, regardless of the sample size or density of the map. The effects of increasing sample size or marker density are also investigated. Based upon a single marker analysis, we find that a simple strategy of choosing the marker with the smallest associated P value to begin a laboratory search for the disease locus performs adequately for a common disease allele. We also investigated a strategy of pooling nearby sites to form multiple allele markers. Using multiple degree of freedom chi-square tests for two or three nearby sites, we found no clear advantage of this form of pooling over a single marker analysis. Genet. Epidemiol. 20:432-457, 2001. Published by Wiley-Liss, 2001.
对于远交群体中复杂遗传疾病常见易感性等位基因的精细定位基因座而言,关联研究的实用性尚不清楚。我们针对一系列紧密连锁的匿名遗传标记展开研究,这些标记跨越以疾病基因座为中心的候选区域,并研究用于发现和定位疾病基因座的卡方统计量的联合行为。我们使用基于带有突变、重组和遗传漂变的合并过程的模拟方法,在病例对照研究设计中检验具有大非中心参数的标记的空间分布。对于中等频率的疾病等位基因进行模拟,大概代表一个古老的突变,往往在疾病基因座附近的标记处呈现出最大的非中心参数值。相比之下,对于低频疾病等位基因进行模拟,大概代表一个新的突变,通常在散布于候选区域的标记处呈现出最大的非中心参数值。在前一种情况下,足以检测到关联的样本量或标记密度可能会带来有用的定位,而在后一种情况下,无论样本量或图谱密度如何,在候选区域内定位疾病基因座的可能性要小得多。我们还研究了增加样本量或标记密度的影响。基于单标记分析,我们发现对于常见疾病等位基因,选择具有最小关联P值的标记开始实验室搜索疾病基因座的简单策略表现良好。我们还研究了将附近位点合并以形成多等位基因标记的策略。使用针对两个或三个附近位点的多自由度卡方检验,我们发现这种合并形式相较于单标记分析并无明显优势。《遗传流行病学》20:432 - 457,2001年。由威利 - 利斯出版社于2001年出版。