Igo Robert P, Wijsman Ellen M
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
Genet Epidemiol. 2008 Feb;32(2):119-31. doi: 10.1002/gepi.20267.
Variance-components (VC) linkage analysis is a powerful model-free method for assessing linkage, but the distribution of VC logarithm of the odds ratio (LOD) scores may deviate substantially from the assumed asymptotic distribution. Typically, the null distribution of the VC-LOD score and other linkage statistics has been estimated by generating new genotype data independently of the trait data, and computing a linkage statistic for many such marker-simulated data sets. However, marker simulation is susceptible to errors in the assumed marker and map model and is computationally intensive. Here, we describe a method for generating posterior distributions of linkage statistics through simulation of trait data based on the original sample and on results from an initial scan using a Bayesian Markov-chain Monte Carlo (MCMC) approach for oligogenic segregation analysis. We use samples of oligogenic trait models taken from the posterior distribution to generate new samples of trait data, which were paired with the original marker data for analysis. Empirical P-values obtained from trait and marker simulation were similar when derived for several strong linkage signals from published linkage scans, and for analysis of data with a known, simulated, trait model. Furthermore, trait simulation produces the expected null distribution of VC-LOD scores and is computationally fast when marker identity-by-descent estimates from the original data could be reused. These results suggest that trait simulation gives valid estimates of statistical significance of linkage signals. Finally, these results also demonstrate the feasibility of obtaining empirical significance levels for evaluating Bayesian oligogenic linkage signals with either marker or trait simulation.
方差成分(VC)连锁分析是一种强大的无模型连锁评估方法,但VC优势比对数(LOD)得分的分布可能与假定的渐近分布有很大偏差。通常,VC-LOD得分和其他连锁统计量的零分布是通过独立于性状数据生成新的基因型数据,并为许多此类标记模拟数据集计算连锁统计量来估计的。然而,标记模拟容易受到假定的标记和图谱模型中的误差影响,并且计算量很大。在这里,我们描述了一种基于原始样本并利用用于寡基因分离分析的贝叶斯马尔可夫链蒙特卡罗(MCMC)方法,通过对性状数据进行模拟来生成连锁统计量后验分布的方法。我们使用从后验分布中抽取的寡基因性状模型样本生成新的性状数据样本,将其与原始标记数据配对进行分析。当从已发表的连锁扫描中的几个强连锁信号以及对具有已知模拟性状模型的数据进行分析得出时,通过性状和标记模拟获得的经验P值相似。此外,当可以重复使用原始数据中的同源等位基因估计时,性状模拟会产生预期的VC-LOD得分零分布,并且计算速度很快。这些结果表明,性状模拟能够有效估计连锁信号的统计显著性。最后,这些结果还证明了通过标记或性状模拟获得用于评估贝叶斯寡基因连锁信号的经验显著性水平的可行性。