Department of Statistics, University of Oxford, Oxford, UK.
Bioinformatics. 2011 Apr 1;27(7):968-72. doi: 10.1093/bioinformatics/btr061. Epub 2011 Feb 7.
Genetic variation at classical HLA alleles influences many phenotypes, including susceptibility to autoimmune disease, resistance to pathogens and the risk of adverse drug reactions. However, classical HLA typing methods are often prohibitively expensive for large-scale studies. We previously described a method for imputing classical alleles from linked SNP genotype data. Here, we present a modification of the original algorithm implemented in a freely available software suite that combines local data preparation and QC with probabilistic imputation through a remote server.
We introduce two modifications to the original algorithm. First, we present a novel SNP selection function that leads to pronounced increases (up by 40% in some scenarios) in call rate. Second, we develop a parallelized model building algorithm that allows us to process a reference set of over 2500 individuals. In a validation experiment, we show that our framework produces highly accurate HLA type imputations at class I and class II loci for independent datasets: at call rates of 95-99%, imputation accuracy is between 92% and 98% at the four-digit level and over 97% at the two-digit level. We demonstrate utility of the method through analysis of a genome-wide association study for psoriasis where there is a known classical HLA risk allele (HLA-C06:02). We show that the imputed allele shows stronger association with disease than any single SNP within the region. The imputation framework, HLAIMP, provides a powerful tool for dissecting the architecture of genetic risk within the HLA.
HLA*IMP, implemented in C++ and Perl, is available from http://oxfordhla.well.ox.ac.uk and is free for academic use.
经典 HLA 等位基因的遗传变异影响许多表型,包括自身免疫性疾病易感性、对病原体的抵抗力和不良药物反应的风险。然而,经典 HLA 分型方法通常对于大规模研究来说过于昂贵。我们之前描述了一种从连锁 SNP 基因型数据推断经典等位基因的方法。在这里,我们提出了对原始算法的修改,该算法在一个免费提供的软件套件中实现,该套件将本地数据准备和 QC 与通过远程服务器进行的概率推断相结合。
我们对原始算法进行了两项修改。首先,我们提出了一种新的 SNP 选择函数,该函数导致呼叫率显著提高(在某些情况下提高了 40%)。其次,我们开发了一种并行化模型构建算法,使我们能够处理超过 2500 个个体的参考集。在验证实验中,我们表明我们的框架在独立数据集上对 I 类和 II 类位点产生高度准确的 HLA 类型推断:在呼叫率为 95-99%的情况下,在四位数字水平的准确性在 92%到 98%之间,在两位数字水平的准确性超过 97%。我们通过对银屑病的全基因组关联研究进行分析证明了该方法的实用性,其中存在已知的经典 HLA 风险等位基因(HLA-C06:02)。我们表明,与该区域内的任何单个 SNP 相比,推断出的等位基因与疾病的相关性更强。HLAIMP 推断框架为剖析 HLA 内遗传风险的结构提供了强大的工具。
HLA*IMP 是用 C++和 Perl 实现的,可从 http://oxfordhla.well.ox.ac.uk 获得,可免费用于学术用途。