Souverein O W, Zwinderman A H, Tanck M W T
Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, Amsterdam, the Netherlands.
Ann Hum Genet. 2006 May;70(Pt 3):372-81. doi: 10.1111/j.1529-8817.2005.00236.x.
The objective of this study was to investigate the performance of multiple imputation of missing genotype data for unrelated individuals using the polytomous logistic regression model, focusing on different missingness mechanisms, percentages of missing data, and imputation models. A complete dataset of 581 individuals, each analysed for eight biallelic polymorphisms and the quantitative phenotype HDL-C, was used. From this dataset one hundred replicates with missing data were created, in different ways for different scenarios. The performance was assessed by comparing the mean bias in parameter estimates, the root mean squared standard errors, and the genotype-imputation error rates. Overall, the mean bias was small in all scenarios, and in most scenarios the mean did not differ significantly from 'no bias'. Including polymorphisms that are highly correlated in the imputation model reduced the genotype-imputation error rate and increased precision of the parameter estimates. The method works well for data that are missing completely at random, and for data that are missing at random. In conclusion, our results indicate that multiple imputation with the polytomous logistic regression model can be used for association studies to deal with the problem of missing genotype data, when attention is paid to the imputation model and the percentage of missing data.
本研究的目的是使用多分类逻辑回归模型,针对无关个体缺失的基因型数据,研究多重填补的性能,重点关注不同的缺失机制、数据缺失百分比和填补模型。使用了一个包含581个个体的完整数据集,每个个体都针对8个双等位基因多态性和定量表型高密度脂蛋白胆固醇(HDL-C)进行了分析。从该数据集中,针对不同场景以不同方式创建了100个带有缺失数据的重复数据集。通过比较参数估计中的平均偏差、均方根标准误差和基因型填补错误率来评估性能。总体而言,在所有场景中平均偏差都很小,并且在大多数场景中平均值与“无偏差”没有显著差异。在填补模型中纳入高度相关的多态性可降低基因型填补错误率并提高参数估计的精度。该方法对于完全随机缺失的数据以及随机缺失的数据都适用。总之,我们的结果表明,当关注填补模型和数据缺失百分比时,使用多分类逻辑回归模型进行多重填补可用于关联研究,以处理缺失基因型数据的问题。