Madbouly A, Gragert L, Freeman J, Leahy N, Gourraud P-A, Hollenbach J A, Kamoun M, Fernandez-Vina M, Maiers M
Bioinformatics Research, National Marrow Donor Program, Minneapolis, MN, USA.
Tissue Antigens. 2014 Sep;84(3):285-92. doi: 10.1111/tan.12390. Epub 2014 Jul 11.
Genetic matching for loci in the human leukocyte antigen (HLA) region between a donor and a patient in hematopoietic stem cell transplantation (HSCT) is critical to outcome; however, methods for HLA genotyping of donors in unrelated stem cell registries often yield results with allelic and phase ambiguity and/or do not query all clinically relevant loci. We present and evaluate a statistical method for in silico imputation of HLA alleles and haplotypes in large ambiguous population data from the Be The Match(®) Registry. Our method builds on haplotype frequencies estimated from registry populations and exploits patterns of linkage disequilibrium (LD) across HLA haplotypes to infer high resolution HLA assignments. We performed validation on simulated and real population data from the Registry with non-trivial ambiguity content. While real population datasets caused some predictions to deviate from expectation, validations still showed high percent recall for imputed results with average recall >76% when imputing HLA alleles from registry data. We simulated ambiguity generated by several HLA genotyping methods to evaluate the imputation performance on several levels of typing resolution. On average, imputation percent recall of allele-level HLA haplotypes was >95% for allele-level typing, >92% for intermediate resolution typing and >58% for serology (low-resolution) typing. Thus, allele-level HLA assignments can be imputed through the application of a set of statistical and population genetics inferences and with knowledge of haplotype frequencies and self-identified race and ethnicities.
在造血干细胞移植(HSCT)中,供体与患者之间人类白细胞抗原(HLA)区域基因座的基因匹配对移植结果至关重要;然而,在无关干细胞登记处对供体进行HLA基因分型的方法,其结果往往存在等位基因和单倍型模糊性,并且/或者未查询所有临床相关基因座。我们提出并评估了一种统计方法,用于对来自“成为配型者”(Be The Match(®))登记处的大量模糊群体数据进行HLA等位基因和单倍型的计算机模拟推算。我们的方法基于从登记处群体估计的单倍型频率,并利用HLA单倍型之间的连锁不平衡(LD)模式来推断高分辨率的HLA分型。我们使用具有显著模糊性的登记处模拟和实际群体数据进行了验证。虽然实际群体数据集导致一些预测偏离预期,但验证结果仍显示,从登记处数据推算HLA等位基因时,推算结果的召回率很高,平均召回率>76%。我们模拟了几种HLA基因分型方法产生的模糊性,以评估在几个分型分辨率水平上的推算性能。平均而言,对于等位基因水平的分型,等位基因水平HLA单倍型的推算召回率>95%;对于中等分辨率分型,召回率>92%;对于血清学(低分辨率)分型,召回率>58%。因此,通过应用一组统计和群体遗传学推断,并结合单倍型频率以及自我认定的种族和民族信息,可以推算出等位基因水平的HLA分型。