Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
J Transl Med. 2021 Oct 27;19(1):449. doi: 10.1186/s12967-021-03084-x.
Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging.
In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm.
Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA .
We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
癌症是威胁人类健康的最严重疾病之一。与传统治疗方法相比,癌症免疫疗法具有高效性和选择性,副作用较低,代表了最有前途的治疗策略。鉴定肿瘤 T 细胞抗原是开发抗肿瘤疫苗和研究分子功能的最重要任务之一。尽管已经开发了几种机器学习预测器来鉴定肿瘤 T 细胞抗原,但使用现有方法仍然难以更准确地鉴定肿瘤 T 细胞抗原。
在这项研究中,我们使用了 592 个肿瘤 T 细胞抗原(阳性样本)和 393 个肿瘤 T 细胞抗原(阴性样本)的非冗余数据集。研究了四种类型的特征编码方法来构建高效的预测器,包括氨基酸组成、全局蛋白质序列描述符和分组氨基酸和肽组成。为了提高混合特征的特征表示能力,我们进一步采用了两步特征选择技术来搜索最佳特征子集。最终的预测模型是使用随机森林算法构建的。
最终,选择了前 263 个信息丰富的特征来训练随机森林分类器以检测肿瘤 T 细胞抗原肽。iTTCA-RF 在十倍交叉验证中提供了令人满意的性能,平衡准确性、特异性和敏感性值分别为 83.71%、78.73%和 88.69%,独立测试的分别为 73.14%、62.67%和 83.61%。在线预测服务器可在 http://lab.malab.cn/~acy/iTTCA 上免费访问。
我们已经证明,所提出的预测器 iTTCA-RF 优于其他最新模型,有望成为鉴定主要组织相容性复合物 I 类背景下呈现的肿瘤 T 细胞抗原的有效和有用工具。