Lamina Claudia, Bongardt Friedhelm, Küchenhoff Helmut, Heid Iris M
Institute of Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.
PLoS One. 2008 Mar 26;3(3):e1853. doi: 10.1371/journal.pone.0001853.
Statistically reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it.
By numerous simulation scenarios, we systematically investigated several error measures, including discrepancy, error rate, and R(2), and introduced the sensitivity and specificity to this context. We exemplified several measures in the KORA study, a large population-based study from Southern Germany. We find that the specificity is slightly reduced only for common haplotypes, while the sensitivity was decreased for some, but not all rare haplotypes. The overall error rate was generally increasing with increasing number of loci, increasing minor allele frequency of SNPs, decreasing correlation between the alleles and increasing ambiguity.
We conclude that, with the analytical approach presented here, haplotype-specific error measures can be computed to gain insight into the haplotype uncertainty. This method provides the information, if a specific risk haplotype can be expected to be reconstructed with rather no or high misclassification and thus on the magnitude of expected bias in association estimates. We also illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and thus provide the prerequisite for methods accounting for misclassification.
从单核苷酸多态性(SNP)基因型中进行统计性单倍型重建,可能会导致单倍型被错误分类。在解释单倍型关联结果或选择具有特定单倍型的受试者进行后续功能研究时,这可能会成为一个问题。我们的目标是量化单倍型重建误差并为此提供工具。
通过大量模拟场景,我们系统地研究了几种误差度量,包括差异度、错误率和R²,并在此背景下引入了敏感度和特异度。我们在德国南部一项基于人群的大型研究KORA中举例说明了几种度量方法。我们发现,仅常见单倍型的特异度略有降低,而一些但并非所有罕见单倍型的敏感度有所下降。总体错误率通常随着位点数量的增加、SNP次要等位基因频率的增加、等位基因之间相关性的降低以及模糊性的增加而上升。
我们得出结论,使用本文提出的分析方法,可以计算单倍型特异性误差度量,以深入了解单倍型的不确定性。该方法提供了这样的信息,即是否可以预期特定风险单倍型在重建时几乎没有错误分类或有高度错误分类,从而了解关联估计中预期偏差的大小。我们还表明,敏感度和特异度区分了单倍型重建误差的两个维度,这两个维度完全描述了错误分类矩阵,从而为考虑错误分类的方法提供了前提条件。