Center for Research in Transplantation and Translational Immunology, Nantes Université, INSERM, Ecole Centrale Nantes, Nantes, France.
Molecular Genetics and Bioinformatics Laboratory, School of Medicine, São Paulo State University, Botucatu, State of São Paulo, Brazil.
HLA. 2024 Jun;103(6):e15543. doi: 10.1111/tan.15543.
The MHC class I region contains crucial genes for the innate and adaptive immune response, playing a key role in susceptibility to many autoimmune and infectious diseases. Genome-wide association studies have identified numerous disease-associated SNPs within this region. However, these associations do not fully capture the immune-biological relevance of specific HLA alleles. HLA imputation techniques may leverage available SNP arrays by predicting allele genotypes based on the linkage disequilibrium between SNPs and specific HLA alleles. Successful imputation requires diverse and large reference panels, especially for admixed populations. This study employed a bioinformatics approach to call SNPs and HLA alleles in multi-ethnic samples from the 1000 genomes (1KG) dataset and admixed individuals from Brazil (SABE), utilising 30X whole-genome sequencing data. Using HIBAG, we created three reference panels: 1KG (n = 2504), SABE (n = 1171), and the full model (n = 3675) encompassing all samples. In extensive cross-validation of these reference panels, the multi-ethnic 1KG reference exhibited overall superior performance than the reference with only Brazilian samples. However, the best results were achieved with the full model. Additionally, we expanded the scope of imputation by developing reference panels for non-classical, MICA, MICB and HLA-H genes, previously unavailable for multi-ethnic populations. Validation in an independent Brazilian dataset showcased the superiority of our reference panels over the Michigan Imputation Server, particularly in predicting HLA-B alleles among Brazilians. Our investigations underscored the need to enhance or adapt reference panels to encompass the target population's genetic diversity, emphasising the significance of multiethnic references for accurate imputation across different populations.
MHC I 类区域包含先天和适应性免疫反应的关键基因,在许多自身免疫和感染性疾病的易感性中起着关键作用。全基因组关联研究已经在该区域内鉴定出许多与疾病相关的 SNP。然而,这些关联并不能完全捕捉到特定 HLA 等位基因的免疫生物学相关性。HLA 推断技术可以利用可用的 SNP 阵列,根据 SNP 与特定 HLA 等位基因之间的连锁不平衡来预测等位基因基因型。成功的推断需要多样化和大型的参考面板,特别是对于混合人群。本研究采用生物信息学方法在来自 1000 基因组(1KG)数据集的多民族样本和来自巴西的混合个体(SABE)中调用 SNPs 和 HLA 等位基因,利用 30X 全基因组测序数据。使用 HIBAG,我们创建了三个参考面板:1KG(n=2504)、SABE(n=1171)和包含所有样本的全模型(n=3675)。在对这些参考面板进行广泛的交叉验证中,多民族 1KG 参考总体上表现优于仅包含巴西样本的参考。然而,最好的结果是使用全模型获得的。此外,我们通过为非经典、MICA、MICB 和 HLA-H 基因开发参考面板来扩展推断的范围,这些参考面板以前不适用于多民族人群。在一个独立的巴西数据集上的验证表明,我们的参考面板优于密歇根州推断服务器,特别是在预测巴西人中的 HLA-B 等位基因方面。我们的研究强调了需要增强或适应参考面板以包含目标人群的遗传多样性,强调了多民族参考对于不同人群中准确推断的重要性。