Ge Fang, Li Hao-Yang, Zhang Ming, Arif Muhammad, Alam Tanvir
State Key Laboratory of Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, 6 Wenyuan Road, Nanjing 210023, China.
School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China.
ACS Omega. 2024 Dec 16;9(52):51494-51507. doi: 10.1021/acsomega.4c08715. eCollection 2024 Dec 31.
Hepatitis C Virus (HCV) is a bloodborne RNA virus that leads to severe liver diseases, and currently, no effective prophylactic biologics are available to prevent its transmission. The prevention of HCV is closely related to the major histocompatibility complex (MHC). Linear antigenic peptides of HCV, known as T cell epitopes (TCEs), are crucial in the presentation process by MHC molecules to T cells, playing a key role in immune responses. Therefore, the rapid and accurate identification of these TCE-HCVs is essential for advancing vaccine development. Herein, we propose TCellPredX, a novel integrated predictor for TCE-HCV identification. TCellPredX leverages five distinct feature encoding schemes, including local and global sequence encodings, composition-transition-distribution descriptors, physicochemical properties, and embeddings from two protein language models, which are processed through 12 machine learning algorithms. Our results indicate that feature fusion significantly enhances predictive accuracy. Moreover, the maximal relevance minimal redundancy feature selection method is particularly effective in isolating informative features, ensuring the model's use of the most informative data. Additionally, ensemble models, especially when combined with an averaged voting strategy, demonstrate superior stability and accuracy compared to individual classifiers, effectively reducing noise and enhancing model robustness. TCellPredX achieves notable accuracies of 0.900 and 0.897 in 10-fold cross-validation and independent test, respectively. Furthermore, TCellPredX's high accuracy is validated on experimentally verified peptide sequences documented for their potential benefits in vaccine development. Overall, TCellPredX can offer a robust tool for the precise identification of TCE-HCV, potentially serving as a cornerstone for future epitope research and advancing HCV vaccines development.
丙型肝炎病毒(HCV)是一种通过血液传播的RNA病毒,可导致严重的肝脏疾病,目前尚无有效的预防性生物制剂来预防其传播。HCV的预防与主要组织相容性复合体(MHC)密切相关。HCV的线性抗原肽,即T细胞表位(TCE),在MHC分子向T细胞的呈递过程中至关重要,在免疫反应中起关键作用。因此,快速准确地鉴定这些TCE-HCV对于推进疫苗开发至关重要。在此,我们提出了TCellPredX,一种用于鉴定TCE-HCV的新型综合预测器。TCellPredX利用五种不同的特征编码方案,包括局部和全局序列编码、组成-转换-分布描述符、物理化学性质以及来自两种蛋白质语言模型的嵌入,这些通过12种机器学习算法进行处理。我们的结果表明,特征融合显著提高了预测准确性。此外,最大相关最小冗余特征选择方法在分离信息特征方面特别有效,确保模型使用最具信息性的数据。此外,集成模型,特别是与平均投票策略相结合时,与单个分类器相比表现出更高的稳定性和准确性,有效降低了噪声并增强了模型的鲁棒性。TCellPredX在10折交叉验证和独立测试中分别取得了0.900和0.897的显著准确率。此外,TCellPredX在经实验验证的肽序列上得到了验证,这些序列在疫苗开发中具有潜在益处。总体而言,TCellPredX可为精确鉴定TCE-HCV提供一个强大的工具,有可能成为未来表位研究和推进HCV疫苗开发的基石。