Lu Tianchi, Wang Xueying, Nie Wan, Huo Miaozhe, Li Shuaicheng
Department of Computer Science, City University of Hong Kong, Kowloon 999077, Hong Kong.
Department of Computer Science, City University of Hong Kong (Dongguan), Dongguan 523000, China.
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf008.
Precise prediction of epitope presentation on human leukocyte antigen (HLA) molecules is crucial for advancing vaccine development and immunotherapy. Conventional HLA-peptide binding affinity prediction tools often focus on specific alleles and lack a universal approach for comprehensive HLA site analysis. This limitation hinders efficient filtering of invalid peptide segments.
We introduce TransHLA, a pioneering tool designed for epitope prediction across all HLA alleles, integrating Transformer and Residue CNN architectures. TransHLA utilizes the ESM2 large language model for sequence and structure embeddings, achieving high predictive accuracy. For HLA class I, it reaches an accuracy of 84.72% and an area under the curve (AUC) of 91.95% on IEDB test data. For HLA class II, it achieves 79.94% accuracy and an AUC of 88.14%. Our case studies using datasets like CEDAR and VDJdb demonstrate that TransHLA surpasses existing models in specificity and sensitivity for identifying immunogenic epitopes and neoepitopes.
TransHLA significantly enhances vaccine design and immunotherapy by efficiently identifying broadly reactive peptides. Our resources, including data and code, are publicly accessible at https://github.com/SkywalkerLuke/TransHLA.
精确预测人类白细胞抗原(HLA)分子上的表位呈递对于推进疫苗开发和免疫治疗至关重要。传统的HLA-肽结合亲和力预测工具通常专注于特定等位基因,缺乏用于全面HLA位点分析的通用方法。这种局限性阻碍了对无效肽段的有效筛选。
我们引入了TransHLA,这是一种开创性工具,旨在跨所有HLA等位基因进行表位预测,集成了Transformer和残差卷积神经网络(Residue CNN)架构。TransHLA利用ESM2大语言模型进行序列和结构嵌入,实现了高预测准确性。对于I类HLA,在IEDB测试数据上,其准确率达到84.72%,曲线下面积(AUC)为91.95%。对于II类HLA,其准确率为79.94%,AUC为88.14%。我们使用CEDAR和VDJdb等数据集的案例研究表明,在识别免疫原性表位和新表位方面,TransHLA在特异性和敏感性上超过了现有模型。
TransHLA通过有效识别具有广泛反应性的肽段,显著增强了疫苗设计和免疫治疗。我们的数据和代码等资源可在https://github.com/SkywalkerLuke/TransHLA上公开获取。