Cho Kelly, Dupuis Josée
Departments of Genetics and Biostatistics, Yale University Schools of Medicine and Public Health, New Haven, CT 06520-8034, USA.
BMC Genet. 2009 Aug 10;10:44. doi: 10.1186/1471-2156-10-44.
In affected sibling pair linkage analysis, the presence of linkage disequilibrium (LD) has been shown to lead to overestimation of the number of alleles shared identity-by-descent (IBD) among sibling pairs when parents are ungenotyped. This inflation results in spurious evidence for linkage even when the markers and the disease locus are not linked. In our study, we first theoretically evaluate how inflation in IBD probabilities leads to overestimation of a nonparametric linkage (NPL) statistic under the assumption of linkage equilibrium. Next, we propose a two-step processing strategy in order to systematically evaluate approaches to handle LD. Based on the observed inflation of expected logarithm of the odds ratio (LOD) from our theoretical exploration, we implemented our proposed two-step processing strategy. Step 1 involves three techniques to filter a dense set of markers. In step 2, we use the selected subset of markers from step 1 and apply four different methods of handling LD among dense markers: 1) marker thinning (MT); 2) recursive elimination; 3) SNPLINK; and 4) LD modeling approach in MERLIN. We evaluate relative performance of each method through simulation.
We observed LOD score inflation only when the parents were ungenotyped. For a given number of markers, all approaches evaluated for each type of LD threshold performed similarly; however, RE approach was the only one that eliminated the LOD score bias. Our simulation results indicate a reduction of approximately 75% to complete elimination of the LOD score inflation while maintaining the information content (IC) when setting a tolerable squared correlation coefficient LD threshold (r2) above 0.3 for or 2 SNPs per cM using MT.
We have established a theoretical basis of how inflated IBD information among dense markers overestimates a NPL statistic. The two-step processing strategy serves as a useful framework to systematically evaluate relative performance of different methods to handle LD.
在受累同胞对连锁分析中,当父母未进行基因分型时,连锁不平衡(LD)的存在已被证明会导致同胞对之间通过血缘相同(IBD)共享的等位基因数量被高估。这种膨胀即使在标记与疾病位点不连锁时也会导致连锁的虚假证据。在我们的研究中,我们首先从理论上评估在连锁平衡假设下,IBD概率的膨胀如何导致非参数连锁(NPL)统计量的高估。接下来,我们提出一种两步处理策略,以便系统地评估处理LD的方法。基于我们理论探索中观察到的期望优势比对数(LOD)的膨胀,我们实施了我们提出的两步处理策略。第一步涉及三种技术来筛选密集的标记集。在第二步中,我们使用第一步中选择的标记子集,并应用四种不同的方法来处理密集标记之间的LD:1)标记稀疏化(MT);2)递归消除;3)SNPLINK;4)MERLIN中的LD建模方法。我们通过模拟评估每种方法的相对性能。
我们仅在父母未进行基因分型时观察到LOD评分膨胀。对于给定数量的标记,针对每种类型的LD阈值评估的所有方法表现相似;然而,RE方法是唯一消除LOD评分偏差的方法。我们的模拟结果表明,当使用MT将可容忍的平方相关系数LD阈值(r2)设置为高于0.3或每厘摩2个单核苷酸多态性(SNP)时,LOD评分膨胀可减少约75%至完全消除,同时保持信息含量(IC)。
我们已经建立了一个理论基础,即密集标记之间膨胀的IBD信息如何高估NPL统计量。两步处理策略是一个有用的框架,可用于系统地评估处理LD的不同方法的相对性能。