Nanjala Ruth, Mbiyavanga Mamana, Hashim Suhaila, de Villiers Santie, Mulder Nicola
Department of Biochemistry and Biotechnology, Pwani University, Kenya.
Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, South Africa.
bioRxiv. 2023 Jan 23:2023.01.23.525129. doi: 10.1101/2023.01.23.525129.
The Human Leukocyte Antigen (HLA) region plays an important role in autoimmune and infectious diseases. HLA is a highly polymorphic region and thus difficult to impute. We therefore sought to evaluate HLA imputation accuracy, specifically in a West African population, since they are understudied and are known to harbor high genetic diversity. The study sets were selected from Gambian individuals within the Gambian Genome Variation Project (GGVP) Whole Genome Sequence datasets. Two different arrays, Illumina Omni 2.5 and Human Hereditary and Health in Africa (H3Africa), were assessed for the appropriateness of their markers, and these were used to test several imputation panels and tools. The reference panels were chosen from the 1000 Genomes dataset (1kg-All), 1000 Genomes African dataset (1kg-Afr), 1000 Genomes Gambian dataset (1kg-Gwd), H3Africa dataset and the HLA Multi-ethnic dataset. HLA-A, HLA-B and HLA-C alleles were imputed using HIBAG, SNP2HLA, CookHLA and Minimac4, and concordance rate was used as an assessment metric. Overall, the best performing tool was found to be HIBAG, with a concordance rate of 0.84, while the best performing reference panel was the H3Africa panel with a concordance rate of 0.62. Minimac4 (0.75) was shown to increase HLA-B allele imputation accuracy compared to HIBAG (0.71), SNP2HLA (0.51) and CookHLA (0.17). The H3Africa and Illumina Omni 2.5 array performances were comparable, showing that genotyping arrays have less influence on HLA imputation in West African populations. The findings show that using a larger population-specific reference panel and the HIBAG tool improves the accuracy of HLA imputation in West African populations.
For studies that associate a particular HLA type to a phenotypic trait for instance HIV susceptibility or control, genotype imputation remains the main method for acquiring a larger sample size. Genotype imputation, process of inferring unobserved genotypes, is a statistical technique and thus deals with probabilities. Also, the HLA region is highly variable and therefore difficult to impute. In view of this, it is important to assess HLA imputation accuracy especially in African populations. This is because the African genome has high diversity, and such studies have hardly been conducted in African populations. This work highlights that using HIBAG imputation tool and a larger population-specific reference panel increases HLA imputation accuracy in an African population.
人类白细胞抗原(HLA)区域在自身免疫性疾病和感染性疾病中发挥着重要作用。HLA是一个高度多态性的区域,因此难以进行基因填充。鉴于西非人群的研究较少且已知具有高度的遗传多样性,我们试图评估HLA基因填充的准确性,特别是在西非人群中。研究数据集选自冈比亚基因组变异项目(GGVP)全基因组序列数据集中的冈比亚个体。评估了两种不同的芯片,即Illumina Omni 2.5芯片和非洲人类遗传与健康(H3Africa)芯片的标记物适用性,并使用这些芯片测试了几种基因填充面板和工具。参考面板选自千人基因组数据集(1kg-All)、千人基因组非洲数据集(1kg-Afr)、千人基因组冈比亚数据集(1kg-Gwd)、H3Africa数据集和HLA多民族数据集。使用HIBAG、SNP2HLA、CookHLA和Minimac4对HLA-A、HLA-B和HLA-C等位基因进行基因填充,并将一致性率用作评估指标。总体而言,表现最佳的工具是HIBAG,一致性率为0.84,而表现最佳的参考面板是H3Africa面板,一致性率为0.62。与HIBAG(0.71)、SNP2HLA(0.51)和CookHLA(0.17)相比,Minimac4(0.75)在HLA-B等位基因基因填充准确性方面有所提高。H3Africa芯片和Illumina Omni 2.5芯片的表现相当,表明基因分型芯片对西非人群HLA基因填充的影响较小。研究结果表明,使用更大的特定人群参考面板和HIBAG工具可提高西非人群HLA基因填充的准确性。
对于将特定HLA类型与表型特征(如HIV易感性或控制)相关联的研究,例如,基因分型填充仍然是获取更大样本量的主要方法。基因分型填充是推断未观察到的基因型的过程,是一种统计技术,因此涉及概率问题。此外,HLA区域高度可变,因此难以进行基因填充。有鉴于此,评估HLA基因填充的准确性非常重要,尤其是在非洲人群中。这是因为非洲基因组具有高度多样性,而此类研究在非洲人群中几乎没有进行过。这项工作强调,使用HIBAG基因填充工具和更大的特定人群参考面板可提高非洲人群中HLA基因填充的准确性。