基于 Transformer 的基于序贯 SNP 数据的高效 HLA 推测

Efficient HLA imputation from sequential SNPs data by transformer.

机构信息

Faculty of Engineering, Kyoto University, Kyoto, Japan.

Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Tokyo, Japan.

出版信息

J Hum Genet. 2024 Oct;69(10):533-540. doi: 10.1038/s10038-024-01278-x. Epub 2024 Aug 2.

DOI:10.1038/s10038-024-01278-x

PMID:39095607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11422163/

Abstract

Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEPHLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named "HLA Reliable IMpuatioN by Transformer (HLARIMNT)" designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT's performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEPHLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.

摘要

人类白细胞抗原 (HLA) 基因与多种疾病相关，但 HLA 等位基因的直接分型既耗时又昂贵。因此，已经提出了各种利用连续单核苷酸多态性 (SNP) 数据的推断方法，采用统计或深度学习模型，如基于卷积神经网络 (CNN) 的模型 DEEPHLA。然而，这些方法对于罕见等位基因的推断效率有限，并且需要大型参考数据集。在这种情况下，我们开发了一种基于 Transformer 的 HLA 等位基因推断模型，名为“基于 Transformer 的 HLA 可靠推断 (HLARIMNT)”，旨在利用 SNPs 数据的顺序性质。我们使用两个不同的参考面板；泛亚参考面板 (n = 530) 和 1 型糖尿病遗传学联合会 (T1DGC) 参考面板 (n = 5225)，以及一个组合面板 (n = 1060) 来评估 HLARIMNT 的性能。HLARIMNT 在几个指标上的准确性均优于 DEEPHLA，特别是对于罕见等位基因。此外，我们探讨了训练数据大小对推断准确性的影响，发现 HLARIMNT 在所有数据大小上的表现都优于其他模型。这些发现表明，基于 Transformer 的模型不仅可以从连续的 SNP 数据中高效推断 HLA 类型，还可以推断其他潜在的基因类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99ef/11422163/905debf9b94a/10038_2024_1278_Fig1_HTML.jpg

相似文献

Efficient HLA imputation from sequential SNPs data by transformer.基于 Transformer 的基于序贯 SNP 数据的高效 HLA 推测

J Hum Genet. 2024 Oct;69(10):533-540. doi: 10.1038/s10038-024-01278-x. Epub 2024 Aug 2.

A multi-ethnic reference panel to impute HLA classical and non-classical class I alleles in admixed samples: Testing imputation accuracy in an admixed sample from Brazil.用于在混合样本中推断 HLA 经典和非经典 I 类等位基因的多民族参考面板：在巴西的混合样本中测试推断准确性。

HLA. 2024 Jun;103(6):e15543. doi: 10.1111/tan.15543.

Imputing amino acid polymorphisms in human leukocyte antigens.推断人类白细胞抗原中的氨基酸多态性。

PLoS One. 2013 Jun 6;8(6):e64683. doi: 10.1371/journal.pone.0064683. Print 2013.

Deep Learning-Based HLA Allele Imputation Applicable to GWAS.基于深度学习的 HLA 等位基因推断适用于 GWAS。

Methods Mol Biol. 2024;2809:77-85. doi: 10.1007/978-1-0716-3874-3_5.

Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles.构建并基准测试用于 HLA Ⅰ类和Ⅱ类等位基因推断的多民族参考面板。

Hum Mol Genet. 2019 Jun 15;28(12):2078-2092. doi: 10.1093/hmg/ddy443.

18th International HLA and Immunogenetics Workshop: Report on the SNP-HLA Reference Consortium (SHLARC) component.第 18 届国际 HLA 和免疫遗传学研讨会：SNP-HLA 参考联盟 (SHLARC) 部分报告。

HLA. 2024 Jan;103(1):e15293. doi: 10.1111/tan.15293. Epub 2023 Nov 10.

SNP-HLA Reference Consortium (SHLARC): HLA and SNP data sharing for promoting MHC-centric analyses in genomics.单核苷酸多态性-人类白细胞抗原参考联盟 (SHLARC)：促进基因组学中 MHC 中心分析的 HLA 和 SNP 数据共享。

Genet Epidemiol. 2020 Oct;44(7):733-740. doi: 10.1002/gepi.22334. Epub 2020 Jul 18.

Accurate imputation of human leukocyte antigens with CookHLA.利用 CookHLA 进行人类白细胞抗原的精确推断。

Nat Commun. 2021 Feb 24;12(1):1264. doi: 10.1038/s41467-021-21541-5.

SNP-based HLA allele tagging, imputation and association with antiepileptic drug-induced cutaneous reactions in Hong Kong Han Chinese.基于单核苷酸多态性的香港汉族人群 HLA 等位基因标签、推断及其与抗癫痫药物所致皮肤反应的关联

Pharmacogenomics J. 2018 Apr;18(2):340-346. doi: 10.1038/tpj.2017.11. Epub 2017 Apr 11.

Imputation-Based HLA Typing with GWAS SNPs.基于 GWAS SNPs 的推断性 HLA 分型。

Methods Mol Biol. 2024;2809:127-143. doi: 10.1007/978-1-0716-3874-3_9.

引用本文的文献

Machine learning models for pharmacogenomic variant effect predictions - recent developments and future frontiers.用于药物基因组变异效应预测的机器学习模型——最新进展与未来前沿

Pharmacogenomics. 2025 Apr-Apr;26(5-6):171-182. doi: 10.1080/14622416.2025.2504863. Epub 2025 May 22.

STICI: Split-Transformer with integrated convolutions for genotype imputation.STICI：用于基因型填充的集成卷积拆分变压器

Nat Commun. 2025 Jan 31;16(1):1218. doi: 10.1038/s41467-025-56273-3.

Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology.利用深度学习技术对全基因组和复杂基因组区域进行基因型推断的方法。

J Hum Genet. 2024 Oct;69(10):481-486. doi: 10.1038/s10038-023-01213-6. Epub 2024 Jan 15.

Aligned deep neural network for integrative analysis with high-dimensional input.用于高维输入的综合分析的对齐深度神经网络。

J Biomed Inform. 2023 Aug;144:104434. doi: 10.1016/j.jbi.2023.104434. Epub 2023 Jun 28.

本文引用的文献

Medical deep learning-A systematic meta-review.医学深度学习——系统的元综述。

Comput Methods Programs Biomed. 2022 Jun;221:106874. doi: 10.1016/j.cmpb.2022.106874. Epub 2022 May 11.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes.一种用于 1 型糖尿病 HLA 推断和跨种族 MHC 精细定位的深度学习方法。

Nat Commun. 2021 Mar 12;12(1):1639. doi: 10.1038/s41467-021-21975-x.

Accurate imputation of human leukocyte antigens with CookHLA.利用 CookHLA 进行人类白细胞抗原的精确推断。

Nat Commun. 2021 Feb 24;12(1):1264. doi: 10.1038/s41467-021-21541-5.

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.基于循环神经网络的匿名单倍型参考信息基因型推断方法。

PLoS Comput Biol. 2020 Oct 1;16(10):e1008207. doi: 10.1371/journal.pcbi.1008207. eCollection 2020 Oct.

Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population.日本人群主要组织相容性复合体区域的遗传和表型景观。

Nat Genet. 2019 Mar;51(3):470-480. doi: 10.1038/s41588-018-0336-0. Epub 2019 Jan 28.

HLA Association with Drug-Induced Adverse Reactions.HLA 与药物不良反应的关联。

J Immunol Res. 2017;2017:3186328. doi: 10.1155/2017/3186328. Epub 2017 Nov 23.

HLA variation and disease.人类白细胞抗原（HLA）变异与疾病。

Nat Rev Immunol. 2018 May;18(5):325-339. doi: 10.1038/nri.2017.143. Epub 2018 Jan 2.

Use of HLA-B*58:01 genotyping to prevent allopurinol induced severe cutaneous adverse reactions in Taiwan: national prospective cohort study.台湾地区使用HLA - B*58:01基因分型预防别嘌醇所致严重皮肤不良反应的全国前瞻性队列研究

BMJ. 2015 Sep 23;351:h4848. doi: 10.1136/bmj.h4848.

Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk.HLA-DQ和HLA-DR分子中三个氨基酸位置的加性和相互作用效应驱动1型糖尿病风险。

Nat Genet. 2015 Aug;47(8):898-905. doi: 10.1038/ng.3353. Epub 2015 Jul 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 Transformer 的基于序贯 SNP 数据的高效 HLA 推测

Efficient HLA imputation from sequential SNPs data by transformer.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献