利用加权相似图进行 HLA 类型的精确推断。

Accurate HLA type inference using a weighted similarity graph.

机构信息

Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.

出版信息

BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S10. doi: 10.1186/1471-2105-11-S11-S10.

DOI:10.1186/1471-2105-11-S11-S10

PMID:21172045

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3024871/

Abstract

BACKGROUND

The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true.

RESULTS

In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test.

CONCLUSIONS

Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors.

摘要

背景

人类白细胞抗原系统（HLA）包含许多高度多变的基因。HLA 基因在人类免疫系统中起着重要作用，HLA 基因匹配对于人类器官移植的成功至关重要。许多研究表明，HLA 基因的变异与许多自身免疫、炎症和感染性疾病有关。然而，通过血清学或 PCR 对 HLA 基因进行分型既耗时又昂贵，这限制了涉及 HLA 基因的大规模研究。由于获取单核苷酸多态性（SNP）基因型数据更容易且更便宜，因此需要准确的计算算法来从 SNP 基因型数据推断 HLA 基因类型。为了从 SNP 基因型推断 HLA 类型，第一步是从基因型推断 SNP 单倍型。然而，对于相同的 SNP 基因型数据集，不同方法推断的单倍型配置通常不一致，而且通常很难确定哪一个是真实的。

结果

本文利用来自家系的 SNP 基因型数据、一些个体的已知 HLA 基因类型以及推断的 SNP 单倍型与 HLA 基因类型之间的关系，设计了一种准确的 HLA 基因类型推断算法。对于由许多家系组成的人群的基因型推断的一组单倍型，该算法首先基于新的单倍型相似性度量构建加权相似性图，并从已知的 HLA 基因类型中推导出约束边。基于不同 HLA 等位基因应具有不同背景单倍型的原理，该算法搜索具有未知 HLA 基因类型的所有单倍型的最佳标记，以使同一 HLA 基因类型之间的总权重最大化。为了解决模糊的单倍型解决方案，我们使用遗传算法选择倾向于最大化相同优化标准的单倍型配置。我们在 HapMap 数据的一个先前分型子集中进行的实验表明，该算法具有很高的准确性，在留一法测试中，基因 HLA-A 的准确率为 96%，HLA-B 为 95%，HLA-C 为 97%，HLA-DRB1 为 84%，HLA-DQA1 为 98%，HLA-DQB1 为 97%。

结论

我们的算法可以从邻近的 SNP 基因型数据准确推断 HLA 基因类型。与同一输入数据的最近方法相比，我们的算法具有更高的准确性。我们的算法的代码可根据要求免费提供给相应的作者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cc3/3024871/08b5eddb25a5/1471-2105-11-S11-S10-1.jpg

相似文献

Accurate HLA type inference using a weighted similarity graph.

BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S10. doi: 10.1186/1471-2105-11-S11-S10.

HLA type inference via haplotypes identical by descent.

J Comput Biol. 2011 Mar;18(3):483-93. doi: 10.1089/cmb.2010.0258.

New HLA haplotype frequency reference standards: high-resolution and large sample typing of HLA DR-DQ haplotypes in a sample of European Americans.

Tissue Antigens. 2003 Oct;62(4):296-307. doi: 10.1034/j.1399-0039.2003.00103.x.

Molecular analysis of HLA allelic frequencies and haplotypes in Jordanians and comparison with other related populations.

Hum Immunol. 2001 Sep;62(9):901-9. doi: 10.1016/s0198-8859(01)00289-0.

HLA class I (A, B, C) and class II (DRB1, DQA1, DQB1, DPB1) alleles and haplotypes in the Han from southern China.

Tissue Antigens. 2007 Dec;70(6):455-63. doi: 10.1111/j.1399-0039.2007.00932.x. Epub 2007 Sep 27.

[Probability of high resolution full match for human leukocyte antigen loci in unrelated donors and recipients with low resolution match].

Zhongguo Shi Yan Xue Ye Xue Za Zhi. 2010 Dec;18(6):1617-20.

Allele and extended haplotype polymorphism of HLA-A, -C, -B, -DRB1 and -DQB1 loci in Polish population and genetic affinities to other populations.

Tissue Antigens. 2008 Mar;71(3):193-205. doi: 10.1111/j.1399-0039.2007.00991.x. Epub 2008 Jan 7.

Polymorphism of HLA-A, -B, -DRB1, -DQA1 and -DQB1 haplotypes in a Croatian population.

Eur J Immunogenet. 2000 Feb;27(1):47-51. doi: 10.1046/j.1365-2370.2000.00193.x.

Allelic and haplotypic diversity of HLA-A, -B, -C, -DRB1, and -DQB1 genes in the Korean population.

Tissue Antigens. 2005 May;65(5):437-47. doi: 10.1111/j.1399-0039.2005.00386.x.

HLA-A, -C, -B, -DRB1, -DQA1, and -DQB1 Allele and Haplotype Repertoires in the Albanian Population from Kosovo.

Immunol Invest. 2022 Jul;51(5):1232-1242. doi: 10.1080/08820139.2021.1924772. Epub 2021 May 14.

引用本文的文献

Assessing HLA imputation accuracy in a West African population.

PLoS One. 2023 Sep 28;18(9):e0291437. doi: 10.1371/journal.pone.0291437. eCollection 2023.

High resolution HLA haplotyping by imputation for a British population bioresource.

Hum Immunol. 2017 Mar;78(3):242-251. doi: 10.1016/j.humimm.2017.01.006. Epub 2017 Jan 19.

Human leucocyte antigen class I and II imputation in a multiracial population.

Int J Immunogenet. 2016 Dec;43(6):369-375. doi: 10.1111/iji.12292. Epub 2016 Oct 24.

Performance of HLA allele prediction methods in African Americans for class II genes HLA-DRB1, -DQB1, and -DPB1.

BMC Genet. 2014 Jun 16;15:72. doi: 10.1186/1471-2156-15-72.

Predicting HLA genotypes using unphased and flanking single-nucleotide polymorphisms in Han Chinese population.

BMC Genomics. 2014 Jan 29;15:81. doi: 10.1186/1471-2164-15-81.

Imputing amino acid polymorphisms in human leukocyte antigens.

PLoS One. 2013 Jun 6;8(6):e64683. doi: 10.1371/journal.pone.0064683. Print 2013.

Interrogating the major histocompatibility complex with high-throughput genomics.

Hum Mol Genet. 2012 Oct 15;21(R1):R29-36. doi: 10.1093/hmg/dds384. Epub 2012 Sep 12.

Prediction of HLA class II alleles using SNPs in an African population.

PLoS One. 2012;7(6):e40206. doi: 10.1371/journal.pone.0040206. Epub 2012 Jun 28.

本文引用的文献

Regulation of major histocompatibility complex class II gene expression, genetic variation and disease.

Genes Immun. 2010 Mar;11(2):99-112. doi: 10.1038/gene.2009.83. Epub 2009 Nov 5.

An almost linear time algorithm for a general haplotype solution on tree pedigrees with no recombination and its extensions.

J Bioinform Comput Biol. 2009 Jun;7(3):521-45. doi: 10.1142/s0219720009004217.

The human Major Histocompatibility Complex as a paradigm in genomics research.

Brief Funct Genomic Proteomic. 2009 Sep;8(5):379-94. doi: 10.1093/bfgp/elp010. Epub 2009 May 25.

Cost-effective HLA typing with tagging SNPs predicts celiac disease risk haplotypes in the Finnish, Hungarian, and Italian populations.

Immunogenetics. 2009 Apr;61(4):247-56. doi: 10.1007/s00251-009-0361-3. Epub 2009 Mar 3.

The HLA genomic loci map: expression, interaction, diversity and disease.

J Hum Genet. 2009 Jan;54(1):15-39. doi: 10.1038/jhg.2008.5. Epub 2009 Jan 9.

Whole population, genome-wide mapping of hidden relatedness.

Genome Res. 2009 Feb;19(2):318-26. doi: 10.1101/gr.081398.108. Epub 2008 Oct 29.

Two single nucleotide polymorphisms identify the highest-risk diabetes HLA genotype: potential for rapid screening.

Diabetes. 2008 Nov;57(11):3152-5. doi: 10.2337/db08-0605. Epub 2008 Aug 11.

HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.

Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.

Effective detection of human leukocyte antigen risk alleles in celiac disease using tag single nucleotide polymorphisms.

PLoS One. 2008 May 28;3(5):e2270. doi: 10.1371/journal.pone.0002270.

A survey on haplotyping algorithms for tightly linked markers.

J Bioinform Comput Biol. 2008 Feb;6(1):241-59. doi: 10.1142/s0219720008003369.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用加权相似图进行 HLA 类型的精确推断。

Accurate HLA type inference using a weighted similarity graph.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献