PAPreC：一种用于比较细菌抗原性预测方法的流程

PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria.

作者信息

Martins Yasmmin C, Cerqueira E Costa Maiana O, Palumbo Miranda C, F Do Porto Dario, Custódio Fábio L, Trevizani Raphael, Nicolás Marisa Fabiana

机构信息

Bioinformatics Laboratory, National Laboratory for Scientific Computing, Av. Getúlio Vargas 333, 25651-075 Petrópolis, Brazil.

Department of Biological Chemistry, Faculty of Exact and Natural Sciences, University of Buenos Aires - UBA, Av. Int. Cantilo, C1428 Buenos Aires, Argentina.

出版信息

ACS Omega. 2025 Feb 3;10(6):5415-5429. doi: 10.1021/acsomega.4c07147. eCollection 2025 Feb 18.

Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving and shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.