HLA公平性：审视泛等位基因肽-HLA结合预测因子中的偏差。

HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

作者信息

Conev Anja, Fasoulis Romanos, Hall-Swan Sarah, Ferreira Rodrigo, Kavraki Lydia E

机构信息

Department of Computer Science, Rice University, Houston, TX, USA.

出版信息

iScience. 2023 Dec 2;27(1):108613. doi: 10.1016/j.isci.2023.108613. eCollection 2024 Jan 19.

DOI:10.1016/j.isci.2023.108613

PMID:38188519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10770483/

Abstract

Peptide-HLA (pHLA) binding prediction is essential in screening peptide candidates for personalized peptide vaccines. Machine learning (ML) pHLA binding prediction tools are trained on vast amounts of data and are effective in screening peptide candidates. Most ML models report the ability to generalize to HLA alleles unseen during training ("pan-allele" models). However, the use of datasets with imbalanced allele content raises concerns about biased model performance. First, we examine the data bias of two ML-based pan-allele pHLA binding predictors. We find that the pHLA datasets overrepresent alleles from geographic populations of high-income countries. Second, we show that the identified data bias is perpetuated within ML models, leading to algorithmic bias and subpar performance for alleles expressed in low-income geographic populations. We draw attention to the potential therapeutic consequences of this bias, and we challenge the use of the term "pan-allele" to describe models trained with currently available public datasets.

摘要

肽与人类白细胞抗原（pHLA）结合预测对于筛选个性化肽疫苗的肽候选物至关重要。机器学习（ML）pHLA结合预测工具基于大量数据进行训练，在筛选肽候选物方面很有效。大多数ML模型报告称能够推广到训练期间未见过的HLA等位基因（“泛等位基因”模型）。然而，使用等位基因含量不平衡的数据集引发了对模型性能偏差的担忧。首先，我们检查了两种基于ML的泛等位基因pHLA结合预测器的数据偏差。我们发现pHLA数据集过度代表了来自高收入国家地理人群的等位基因。其次，我们表明所识别的数据偏差在ML模型中持续存在，导致算法偏差以及在低收入地理人群中表达的等位基因的性能不佳。我们提请注意这种偏差可能产生的治疗后果，并对使用“泛等位基因”一词来描述用当前可用公共数据集训练的模型提出质疑。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/925f/10770483/30e25c7624bd/fx1.jpg

相似文献

HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

iScience. 2023 Dec 2;27(1):108613. doi: 10.1016/j.isci.2023.108613. eCollection 2024 Jan 19.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Tranexamic acid for preventing postpartum haemorrhage after caesarean section.

Cochrane Database Syst Rev. 2024 Nov 13;11(11):CD016278. doi: 10.1002/14651858.CD016278.

Cell salvage for the management of postpartum haemorrhage.

Cochrane Database Syst Rev. 2024 Dec 20;12(12):CD016120. doi: 10.1002/14651858.CD016120.

Caesarean myomectomy in pregnant women with uterine fibroids.

Cochrane Database Syst Rev. 2025 Jan 27;1(1):CD016119. doi: 10.1002/14651858.CD016119.

Incentives for preventing smoking in children and adolescents.

Cochrane Database Syst Rev. 2017 Jun 6;6(6):CD008645. doi: 10.1002/14651858.CD008645.pub3.

Uterotonic agents for preventing postpartum haemorrhage: a network meta-analysis.

Cochrane Database Syst Rev. 2018 Apr 25;4(4):CD011689. doi: 10.1002/14651858.CD011689.pub2.

Methods for blood loss estimation after vaginal birth.

Cochrane Database Syst Rev. 2018 Sep 13;9(9):CD010980. doi: 10.1002/14651858.CD010980.pub2.

Implementation strategies for WHO guidelines to prevent, detect, and treat postpartum hemorrhage.

Cochrane Database Syst Rev. 2025 Feb 26;2(2):CD016223. doi: 10.1002/14651858.CD016223.

Transfusion of blood and blood products for the management of postpartum haemorrhage.

Cochrane Database Syst Rev. 2025 Feb 6;2(2):CD016168. doi: 10.1002/14651858.CD016168.

引用本文的文献

TCR2HLA: calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes.

bioRxiv. 2025 Jul 23:2025.07.18.665436. doi: 10.1101/2025.07.18.665436.

Challenges and opportunities in mRNA vaccine development against bacteria.

Nat Microbiol. 2025 Aug;10(8):1816-1828. doi: 10.1038/s41564-025-02070-z. Epub 2025 Jul 29.

本文引用的文献

A comprehensive assessment and comparison of tools for HLA class I peptide-binding prediction.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad150.

The IPD-IMGT/HLA Database.

Nucleic Acids Res. 2023 Jan 6;51(D1):D1053-D1060. doi: 10.1093/nar/gkac1011.

Seq2Neo: A Comprehensive Pipeline for Cancer Neoantigen Immunogenicity Prediction.

Int J Mol Sci. 2022 Oct 1;23(19):11624. doi: 10.3390/ijms231911624.

SARS-Arena: Sequence and Structure-Guided Selection of Conserved Peptides from SARS-related Coronaviruses for Novel Vaccine Development.

Front Immunol. 2022 Jul 12;13:931155. doi: 10.3389/fimmu.2022.931155. eCollection 2022.

Artificial Intelligence and Machine Learning Technologies in Cancer Care: Addressing Disparities, Bias, and Data Diversity.

Cancer Discov. 2022 Jun 2;12(6):1423-1427. doi: 10.1158/2159-8290.CD-22-0373.

HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction.

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac173.

Addressing bias in big data and AI for health care: A call for open science.

Patterns (N Y). 2021 Oct 8;2(10):100347. doi: 10.1016/j.patter.2021.100347.

Withdrawn: Precision Neoantigen Discovery Using Large-scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation.

Mol Cell Proteomics. 2021;20:100111. doi: 10.1016/j.mcpro.2021.100111. Epub 2021 Jun 12.

DeepNetBim: deep learning model for predicting HLA-epitope interactions based on network analysis by harnessing binding and immunogenicity information.

BMC Bioinformatics. 2021 May 5;22(1):231. doi: 10.1186/s12859-021-04155-y.

Deep learning pan-specific model for interpretable MHC-I peptide binding prediction with improved attention mechanism.

Proteins. 2021 Jul;89(7):866-883. doi: 10.1002/prot.26065. Epub 2021 Mar 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HLA公平性：审视泛等位基因肽-HLA结合预测因子中的偏差。

HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

作者信息

Conev Anja, Fasoulis Romanos, Hall-Swan Sarah, Ferreira Rodrigo, Kavraki Lydia E

机构信息

Department of Computer Science, Rice University, Houston, TX, USA.

出版信息

iScience. 2023 Dec 2;27(1):108613. doi: 10.1016/j.isci.2023.108613. eCollection 2024 Jan 19.

DOI:10.1016/j.isci.2023.108613

PMID:38188519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10770483/

Abstract

摘要

HLA公平性：审视泛等位基因肽-HLA结合预测因子中的偏差。

HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

HLA公平性：审视泛等位基因肽-HLA结合预测因子中的偏差。

HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献