Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel.
Bioinformatics Research, National Marrow Donor Program, Minneapolis, MN, USA.
Immunogenetics. 2018 May;70(5):279-292. doi: 10.1007/s00251-017-1040-4. Epub 2017 Nov 9.
Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.
无论采样深度如何,在高度多态性区域,准确的基因型推断都受到限制,这些区域通常具有长尾单倍型频率分布。因此,许多罕见的单倍型无法被观察到。目前尚未探索使用与等位基因和单倍型频率相关的连锁不平衡模式来扩展参考单倍型分布以改善推断的统计方法。在无关干细胞移植领域,高度多态性人类白细胞抗原(HLA)基因的推断在搜索全球超过 2800 万供体的大型注册中心时,对于确定最佳匹配的干细胞供体具有重要应用。尽管这些大型注册中心规模庞大,但相当一部分搜索患者呈现出新的 HLA 单倍型。支持这一观察结果的是,HLA 群体遗传模型表明,许多现存的 HLA 单倍型仍然未被观察到。缺失的单倍型是单倍型匹配错误的一个重要原因。我们应用了一种贝叶斯推断方法来扩展单倍型频率分布,该方法使用一种通过观察到的等位基因重组创建新单倍型的模型。通过使用国家骨髓捐赠者计划注册中心的五个位点 HLA 频率数据,我们说明了这种联合概率模型在频率分布估计方面明显优于现有最佳替代方法。涉及稀有变异的相位和推断的移植匹配算法和疾病关联研究可能受益于这种统计推断框架。