Kuniholm M H, Xie X, Anastos K, Xue X, Reimers L, French A L, Gange S J, Kassaye S G, Kovacs A, Wang T, Aouizerat B E, Strickler H D
Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA.
Department of Medicine, Montefiore Medical Center, Bronx, NY, USA.
Int J Immunogenet. 2016 Dec;43(6):369-375. doi: 10.1111/iji.12292. Epub 2016 Oct 24.
Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms - SNP2HLA and HIBAG - in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy - defined as the percentage of correctly predicted alleles - of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80% training group and a 20% testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89%. Accuracy by HLA gene was 93% for HLA-A, 84% for HLA-B, 94% for HLA-C, 83% for HLA-DQA1, 91% for HLA-DQB1 and 88% for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95%) imputation accuracy.
人类白细胞抗原(HLA)基因在病原体应答和自身免疫中发挥着核心作用。由于HLA基因分型方案既耗费人力又成本高昂,因此旨在了解HLA基因对健康影响的研究一直受到限制。最近,利用全基因组关联研究(GWAS)数据推算HLA基因型数据的算法已经发布。然而,这些算法中的大多数推算准确性主要基于欧洲血统个体的训练数据集。我们在一个由1587名女性组成的多种族人群中,通过金标准方法获得了HLA基因分型数据,研究了两种专门用于HLA推算的算法——SNP2HLA和HIBAG的性能。我们首先将数据集细分为80%的训练组和20%的测试组,比较了使用SNP2HLA和HIBAG对HLA - B和HLA - C进行推算时的准确性(定义为正确预测等位基因百分比)。HIBAG的准确性估计与SNP2HLA相同或更高。然后,我们使用五个独立的10倍交叉验证程序,通过祖先信息标记划分祖先群体,对HIBAG推算准确性进行了更全面的测试。HIBAG的总体准确性为89%。各HLA基因的准确性分别为:HLA - A为93%,HLA - B为84%,HLA - C为94%,HLA - DQA1为83%,HLA - DQB1为91%,HLA - DRB1为88%。在非洲血统组(最大的组)中准确性最高,在西班牙裔组(最小的组)中准确性最低。尽管某些HLA基因/祖先群体组合的推算准确性欠佳,但HIBAG算法的优势在于能够提供准确性的后验估计,使研究者能够分析推算准确性高(如>95%)的人群子集。