Martins Yasmmin C, Cerqueira E Costa Maiana O, Palumbo Miranda C, F Do Porto Dario, Custódio Fábio L, Trevizani Raphael, Nicolás Marisa Fabiana
Bioinformatics Laboratory, National Laboratory for Scientific Computing, Av. Getúlio Vargas 333, 25651-075 Petrópolis, Brazil.
Department of Biological Chemistry, Faculty of Exact and Natural Sciences, University of Buenos Aires - UBA, Av. Int. Cantilo, C1428 Buenos Aires, Argentina.
ACS Omega. 2025 Feb 3;10(6):5415-5429. doi: 10.1021/acsomega.4c07147. eCollection 2025 Feb 18.
Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving and shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
抗原性预测在疫苗开发、基于抗体的治疗以及诊断检测中发挥着关键作用,因为这种预测方法有助于评估分子结构诱导和募集免疫细胞以及驱动抗体产生的潜力。现有的几种针对通过反向疫苗学鉴定出的完整蛋白质和表位的预测方法,在输入数据限制、特征提取策略以及模型评估和解释的灵活性不足等方面存在局限性。这项工作提出了PAPreC(抗原性预测比较管道),这是一个开源的通用工作流程(可在https://github.com/YasCoMa/paprec_nx_workflow获取),旨在应对这些挑战。PAPreC系统地研究了三个关键因素:训练数据集的选择、特征提取方法(包括物理化学描述符和ESM-2编码器衍生的嵌入)以及多种分类器。它提供自动模型评估、通过SHapley加性解释(SHAP)分析的可解释性以及适用域评估,使研究人员能够为其特定数据集确定最佳配置。将PAPreC应用于IEDB数据作为参考,我们证明了它在ESKAPE病原体组中的有效性。一项涉及[具体内容缺失]的案例研究表明,特定的特征配置更适合不同的序列类型,并且ESM-2嵌入增强了模型性能。此外,我们的结果表明不需要针对革兰氏阳性和革兰氏阴性细菌分别建立模型。PAPreC提供了一个全面、适应性强且稳健的框架,以简化和改进针对各种细菌数据集的抗原性预测。