Bukhari Syed Nisar Hussain, Ogudo Kingsley A
National Institute of Electronics and Information Technology (NIELIT), Ministry of Electronics and Information Technology (MeitY), Government of India, Srinagar 191132, India.
Department of Electrical & Electronics Engineering, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg 0524, South Africa.
Bioengineering (Basel). 2024 Aug 5;11(8):791. doi: 10.3390/bioengineering11080791.
Respiratory syncytial virus (RSV) is a common respiratory pathogen that infects the human lungs and respiratory tract, often causing symptoms similar to the common cold. Vaccination is the most effective strategy for managing viral outbreaks. Currently, extensive efforts are focused on developing a vaccine for RSV. Traditional vaccine design typically involves using an attenuated form of the pathogen to elicit an immune response. In contrast, peptide-based vaccines (PBVs) aim to identify and chemically synthesize specific immunodominant peptides (IPs), known as T-cell epitopes (TCEs), to induce a targeted immune response. Despite their potential for enhancing vaccine safety and immunogenicity, PBVs have received comparatively less attention. Identifying IPs for PBV design through conventional wet-lab experiments is challenging, costly, and time-consuming. Machine learning (ML) techniques offer a promising alternative, accurately predicting TCEs and significantly reducing the time and cost of vaccine development. This study proposes the development and evaluation of eight hybrid ML predictive models created through the permutations and combinations of two classification methods, two feature weighting techniques, and two feature selection algorithms, all aimed at predicting the TCEs of RSV. The models were trained using the experimentally determined TCEs and non-TCE sequences acquired from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) repository. The hybrid model composed of the XGBoost (XGB) classifier, chi-squared (ChST) weighting technique, and backward search (BST) as the optimal feature selection algorithm (ChST-BST-XGB) was identified as the best model, achieving an accuracy, sensitivity, specificity, F1 score, AUC, precision, and MCC of 97.10%, 0.98, 0.97, 0.98, 0.99, 0.99, and 0.96, respectively. Additionally, K-fold cross-validation (KFCV) was performed to ensure the model's reliability and an average accuracy of 97.21% was recorded for the ChST-BST-XGB model. The results indicate that the hybrid XGBoost model consistently outperforms other hybrid approaches. The epitopes predicted by the proposed model may serve as promising vaccine candidates for RSV, subject to in vitro and in vivo scientific assessments. This model can assist the scientific community in expediting the screening of active TCE candidates for RSV, ultimately saving time and resources in vaccine development.
呼吸道合胞病毒(RSV)是一种常见的呼吸道病原体,可感染人类肺部和呼吸道,常引发类似普通感冒的症状。接种疫苗是应对病毒爆发的最有效策略。目前,大量工作都集中在研发RSV疫苗上。传统疫苗设计通常采用减毒病原体来引发免疫反应。相比之下,基于肽的疫苗(PBV)旨在识别并化学合成特定的免疫显性肽(IP),即所谓的T细胞表位(TCE),以诱导靶向免疫反应。尽管PBV具有提高疫苗安全性和免疫原性的潜力,但受到的关注相对较少。通过传统的湿实验室实验来确定用于PBV设计的IP具有挑战性、成本高且耗时。机器学习(ML)技术提供了一种有前景的替代方法,能够准确预测TCE,显著减少疫苗研发的时间和成本。本研究提出开发和评估八个混合ML预测模型,这些模型通过两种分类方法、两种特征加权技术和两种特征选择算法的排列组合创建,均旨在预测RSV的TCE。使用从细菌和病毒生物信息学资源中心(BV - BRC)存储库获取的实验确定的TCE和非TCE序列对模型进行训练。由XGBoost(XGB)分类器、卡方(ChST)加权技术和反向搜索(BST)作为最优特征选择算法组成的混合模型(ChST - BST - XGB)被确定为最佳模型,其准确率、灵敏度、特异性、F1分数、AUC、精确率和马修斯相关系数分别达到97.10%、0.98、0.97、0.98、0.99、0.99和0.96。此外,进行了K折交叉验证(KFCV)以确保模型的可靠性,ChST - BST - XGB模型的平均准确率记录为97.21%。结果表明,混合XGBoost模型始终优于其他混合方法。所提出模型预测的表位可能成为有前景的RSV疫苗候选物,但需经过体外和体内科学评估。该模型可协助科学界加快筛选RSV活性TCE候选物,最终在疫苗研发中节省时间和资源。