Mandal Diptasri M, Sorant Alexa J M, Atwood Larry D, Wilson Alexander F, Bailey-Wilson Joan E
Department of Genetics, Louisiana State University Health Sciences Center, CSRB 6-16, New Orleans, LA 70112, USA.
BMC Genet. 2006 Apr 20;7:21. doi: 10.1186/1471-2156-7-21.
Studies of model-based linkage analysis show that trait or marker model misspecification leads to decreasing power or increasing Type I error rate. An increase in Type I error rate is seen when marker related parameters (e.g., allele frequencies) are misspecified and ascertainment is through the trait, but lod-score methods are expected to be robust when ascertainment is random (as is often the case in linkage studies of quantitative traits). In previous studies, the power of lod-score linkage analysis using the "correct" generating model for the trait was found to increase when the marker allele frequencies were misspecified and parental data were missing. An investigation of Type I error rates, conducted in the absence of parental genotype data and with misspecification of marker allele frequencies, showed that an inflation in Type I error rate was the cause of at least part of this apparent increased power. To investigate whether the observed inflation in Type I error rate in model-based LOD score linkage was due to sampling variation, the trait model was estimated from each sample using REGCHUNT, an automated segregation analysis program used to fit models by maximum likelihood using many different sets of initial parameter estimates.
The Type I error rates observed using the trait models generated by REGCHUNT were usually closer to the nominal levels than those obtained when assuming the generating trait model.
This suggests that the observed inflation of Type I error upon misspecification of marker allele frequencies is at least partially due to sampling variation. Thus, with missing parental genotype data, lod-score linkage is not as robust to misspecification of marker allele frequencies as has been commonly thought.
基于模型的连锁分析研究表明,性状或标记模型设定错误会导致检验效能降低或I型错误率增加。当标记相关参数(如等位基因频率)设定错误且通过性状进行确定时,会出现I型错误率增加的情况,但当确定是随机的(如在数量性状连锁研究中经常出现的情况)时,预计对数计分法具有稳健性。在先前的研究中,当标记等位基因频率设定错误且亲本数据缺失时,使用性状的“正确”生成模型进行对数计分连锁分析的检验效能会增加。在没有亲本基因型数据且标记等位基因频率设定错误的情况下进行的I型错误率调查表明,I型错误率的膨胀是这种明显增加的检验效能的至少部分原因。为了研究基于模型的对数计分连锁中观察到的I型错误率膨胀是否是由于抽样变异,使用REGCHUNT从每个样本中估计性状模型,REGCHUNT是一个自动分离分析程序,用于通过最大似然法使用许多不同的初始参数估计集来拟合模型。
使用REGCHUNT生成的性状模型观察到的I型错误率通常比假设生成性状模型时获得的I型错误率更接近名义水平。
这表明在标记等位基因频率设定错误时观察到的I型错误率膨胀至少部分是由于抽样变异。因此,在亲本基因型数据缺失的情况下,对数计分连锁对标记等位基因频率设定错误的稳健性不如通常认为的那样。