Chen Yichao, Zhang Yuhan, He Yongqun
University of Michigan, Ann Arbor, MI 48109, USA.
Penn State University, State College, PA 16803, USA.
bioRxiv. 2024 Sep 8:2024.09.04.611295. doi: 10.1101/2024.09.04.611295.
Many vaccine design programs have been developed, including our own machine learning approaches Vaxign-ML and Vaxign-DL. Using deep learning techniques, Vaxign-DL predicts bacterial protective antigens by calculating 509 biological and biomedical features from protein sequences. In this study, we first used the protein folding ESM program to calculate a set of 1,280 features from individual protein sequences, and then utilized the new set of features separately or in combination with the traditional set of 509 features to predict protective antigens. Our result showed that the usage of ESM-derived features alone was able to accurately predict vaccine antigens with a performance similar to the orginal Vaxign-DL prediction method, and the usage of the combined ESM-derived and orginal Vaxign-DL features significantly improved the prediction performance according to a set of seven scores including specificity, sensitivity, and AUROC. To further evaluate the updated methods, we conducted a Leave-One-Pathogen-Out Validation (LOPOV) study, and found that the usage of ESM-derived features significantly improved the the prediction of vaccine antigens from 10 bacterial pathogens. This research is the first reported study demonstrating the added value of protein folding features for vaccine antigen prediction.
已经开发了许多疫苗设计程序,包括我们自己的机器学习方法Vaxign-ML和Vaxign-DL。Vaxign-DL使用深度学习技术,通过计算蛋白质序列中的509个生物学和生物医学特征来预测细菌保护性抗原。在本研究中,我们首先使用蛋白质折叠ESM程序从单个蛋白质序列中计算出一组1280个特征,然后单独使用这组新特征或与传统的509个特征组合使用,以预测保护性抗原。我们的结果表明,单独使用源自ESM的特征能够准确预测疫苗抗原,其性能与原始的Vaxign-DL预测方法相似,并且根据包括特异性、敏感性和AUROC在内的七个评分,使用源自ESM的特征与原始Vaxign-DL特征的组合显著提高了预测性能。为了进一步评估更新后的方法,我们进行了一项留一病原体验证验证验证(LOPOV)研究,发现使用源自ESM的特征显著提高了对10种细菌病原体疫苗抗原的预测。本研究是首次报道的证明蛋白质折叠特征对疫苗抗原预测具有附加价值的研究。