Nunes Kelly, Zheng Xiuwen, Torres Margareth, Moraes Maria Elisa, Piovezan Bruno Z, Pontes Gerlandia N, Kimura Lilian, Carnavalli Juliana E P, Mingroni Netto Regina C, Meyer Diogo
University of São Paulo, Department of Genetics and Evolutionary Biology, São Paulo, Brazil.
University of Washington, Department of Biostatistics, Seattle, WA, USA.
Hum Immunol. 2016 Mar;77(3):307-312. doi: 10.1016/j.humimm.2015.11.004. Epub 2015 Nov 12.
Methods to impute HLA alleles based on dense single nucleotide polymorphism (SNP) data provide a valuable resource to association studies and evolutionary investigation of the MHC region. The availability of appropriate training sets is critical to the accuracy of HLA imputation, and the inclusion of samples with various ancestries is an important pre-requisite in studies of admixed populations. We assess the accuracy of HLA imputation using 1000 Genomes Project data as a training set, applying it to a highly admixed Brazilian population, the Quilombos from the state of São Paulo. To assess accuracy, we compared imputed and experimentally determined genotypes for 146 samples at 4 HLA classical loci. We found imputation accuracies of 82.9%, 81.8%, 94.8% and 86.6% for HLA-A, -B, -C and -DRB1 respectively (two-field resolution). Accuracies were improved when we included a subset of Quilombo individuals in the training set. We conclude that the 1000 Genomes data is a valuable resource for construction of training sets due to the diversity of ancestries and the potential for a large overlap of SNPs with the target population. We also show that tailoring training sets to features of the target population substantially enhances imputation accuracy.
基于密集单核苷酸多态性(SNP)数据推算HLA等位基因的方法为MHC区域的关联研究和进化研究提供了宝贵资源。合适训练集的可用性对于HLA推算的准确性至关重要,并且纳入具有不同祖先的样本是混合人群研究的重要先决条件。我们使用千人基因组计划数据作为训练集评估HLA推算的准确性,并将其应用于高度混合的巴西圣保罗州基隆波人群体。为了评估准确性,我们比较了146个样本在4个HLA经典位点的推算基因型和实验确定的基因型。我们发现HLA-A、-B、-C和-DRB1的推算准确率分别为82.9%、81.8%、94.8%和86.6%(两位点分辨率)。当我们在训练集中纳入一部分基隆波个体时,准确率有所提高。我们得出结论,由于祖先的多样性以及SNP与目标人群的大量重叠可能性,千人基因组数据是构建训练集的宝贵资源。我们还表明,根据目标人群的特征定制训练集可显著提高推算准确性。