• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

StackTTCA:一种基于堆叠集成学习的框架,用于准确、高通量地鉴定肿瘤 T 细胞抗原。

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens.

机构信息

Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.

Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.

出版信息

BMC Bioinformatics. 2023 Jul 28;24(1):301. doi: 10.1186/s12859-023-05421-x.

DOI:10.1186/s12859-023-05421-x
PMID:37507654
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10386778/
Abstract

BACKGROUND

The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision.

RESULTS

In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866.

CONCLUSIONS

In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.

摘要

背景

鉴定肿瘤 T 细胞抗原(TTCAs)对于深入了解其功能机制以及利用其在抗癌疫苗开发中的潜力至关重要。在这种情况下,TTCAs 极具前景。同时,发现和表征新 TTCAs 的实验技术既昂贵又耗时。尽管已经提出了许多基于机器学习(ML)的模型来识别新的 TTCAs,但仍需要开发一个能够实现更高准确率和精度的稳健模型。

结果

在这项研究中,我们提出了一种新的基于堆叠集成学习的框架,称为 StackTTCA,用于准确和大规模鉴定 TTCAs。首先,我们使用 12 种不同的特征编码方案和 13 种流行的 ML 算法构建了 156 种不同的基线模型。其次,对这些基线模型进行训练并用于创建新的概率特征向量。最后,根据特征选择策略确定最佳概率特征向量,并将其用于构建我们的堆叠模型。与几个 ML 分类器和现有方法的比较基准实验表明,StackTTCA 在独立测试中明显优于其他方法,准确率为 0.932,马修斯相关系数为 0.866。

结论

总之,所提出的基于堆叠集成学习的 StackTTCA 框架可以帮助精确快速地识别真正的 TTCAs,以便进行后续的实验验证。此外,我们开发了一个在线网络服务器(http://2pmlab.camt.cmu.ac.th/StackTTCA),以最大限度地提高用户便利性,用于新型 TTCAs 的高通量筛选。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/7e37154cab4b/12859_2023_5421_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/01593471624f/12859_2023_5421_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/58b8aa2b3411/12859_2023_5421_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/399008a2984d/12859_2023_5421_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/0d1de1835a89/12859_2023_5421_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/291b3daeca82/12859_2023_5421_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/7e37154cab4b/12859_2023_5421_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/01593471624f/12859_2023_5421_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/58b8aa2b3411/12859_2023_5421_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/399008a2984d/12859_2023_5421_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/0d1de1835a89/12859_2023_5421_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/291b3daeca82/12859_2023_5421_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee1f/10386778/7e37154cab4b/12859_2023_5421_Fig6_HTML.jpg

相似文献

1
StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens.StackTTCA:一种基于堆叠集成学习的框架,用于准确、高通量地鉴定肿瘤 T 细胞抗原。
BMC Bioinformatics. 2023 Jul 28;24(1):301. doi: 10.1186/s12859-023-05421-x.
2
PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning.PSRTTCA:一种使用倾向评分表示学习改进肿瘤T细胞抗原预测和表征的新方法。
Comput Biol Med. 2023 Jan;152:106368. doi: 10.1016/j.compbiomed.2022.106368. Epub 2022 Nov 26.
3
An integrative machine learning model for the identification of tumor T-cell antigens.一种用于鉴定肿瘤 T 细胞抗原的集成机器学习模型。
Biosystems. 2024 Mar;237:105177. doi: 10.1016/j.biosystems.2024.105177. Epub 2024 Mar 6.
4
iTTCA-MVL: A multi-view learning model based on physicochemical information and sequence statistical information for tumor T cell antigens identification.iTTCA-MVL:一种基于理化信息和序列统计信息的多视图学习模型,用于鉴定肿瘤 T 细胞抗原。
Comput Biol Med. 2024 Mar;170:107941. doi: 10.1016/j.compbiomed.2024.107941. Epub 2024 Jan 1.
5
iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion.iTTCA-MFF:基于多特征融合的肿瘤 T 细胞抗原识别。
Immunogenetics. 2022 Oct;74(5):447-454. doi: 10.1007/s00251-022-01258-5. Epub 2022 Mar 5.
6
ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features.ENCAP:使用集成分类器和多种序列特征进行肿瘤 T 细胞抗原的计算预测。
PLoS One. 2024 Jul 18;19(7):e0307176. doi: 10.1371/journal.pone.0307176. eCollection 2024.
7
A novel stacking-based predictor for accurate prediction of antimicrobial peptides.一种用于准确预测抗菌肽的基于堆叠的新型预测器。
J Biomol Struct Dyn. 2024 Mar 18:1-12. doi: 10.1080/07391102.2024.2329298.
8
Pretoria: An effective computational approach for accurate and high-throughput identification of CD8 t-cell epitopes of eukaryotic pathogens.比勒陀利亚:一种用于准确且高通量鉴定真核病原体CD8 T细胞表位的有效计算方法。
Int J Biol Macromol. 2023 May 31;238:124228. doi: 10.1016/j.ijbiomac.2023.124228. Epub 2023 Mar 29.
9
StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides.StackIL6:一种用于提高白细胞介素 6 诱导肽预测能力的堆叠集成模型。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab172.
10
NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides.NEPTUNE:一种用于准确、大规模鉴定肿瘤归巢肽的新型计算方法。
Comput Biol Med. 2022 Sep;148:105700. doi: 10.1016/j.compbiomed.2022.105700. Epub 2022 Jun 7.

引用本文的文献

1
Advancing the accuracy of clathrin protein prediction through multi-source protein language models.通过多源蛋白质语言模型提高网格蛋白蛋白质预测的准确性。
Sci Rep. 2025 Jul 8;15(1):24403. doi: 10.1038/s41598-025-08510-4.
2
BGATT-GR: accurate identification of glucocorticoid receptor antagonists based on data augmentation combined with BiGRU-attention.BGATT-GR:基于数据增强结合双向门控循环单元-注意力机制的糖皮质激素受体拮抗剂准确识别
Sci Rep. 2025 Jul 1;15(1):21402. doi: 10.1038/s41598-025-05839-8.
3
Empirical Comparison and Analysis of Artificial Intelligence-Based Methods for Identifying Phosphorylation Sites of SARS-CoV-2 Infection.

本文引用的文献

1
PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning.PSRTTCA:一种使用倾向评分表示学习改进肿瘤T细胞抗原预测和表征的新方法。
Comput Biol Med. 2023 Jan;152:106368. doi: 10.1016/j.compbiomed.2022.106368. Epub 2022 Nov 26.
2
Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework.使用堆叠集成学习框架对可成药蛋白进行计算预测和解释。
iScience. 2022 Aug 5;25(9):104883. doi: 10.1016/j.isci.2022.104883. eCollection 2022 Sep 16.
3
CRISPRCasStack: a stacking strategy-based ensemble learning framework for accurate identification of Cas proteins.
基于人工智能的新冠病毒感染磷酸化位点识别方法的实证比较与分析
Int J Mol Sci. 2024 Dec 21;25(24):13674. doi: 10.3390/ijms252413674.
4
Tumor-Derived Antigenic Peptides as Potential Cancer Vaccines.肿瘤衍生抗原肽作为潜在的癌症疫苗。
Int J Mol Sci. 2024 Apr 30;25(9):4934. doi: 10.3390/ijms25094934.
CRISPRCasStack:一种基于堆叠策略的集成学习框架,用于准确识别 Cas 蛋白。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac335.
4
NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides.NEPTUNE:一种用于准确、大规模鉴定肿瘤归巢肽的新型计算方法。
Comput Biol Med. 2022 Sep;148:105700. doi: 10.1016/j.compbiomed.2022.105700. Epub 2022 Jun 7.
5
SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins.SAPPHIRE:一种基于堆叠的集成学习框架,用于准确预测嗜热蛋白。
Comput Biol Med. 2022 Jul;146:105704. doi: 10.1016/j.compbiomed.2022.105704. Epub 2022 Jun 7.
6
AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning.AMYPred-FRL 是一种通过使用特征表示学习来准确预测淀粉样蛋白的新方法。
Sci Rep. 2022 May 11;12(1):7697. doi: 10.1038/s41598-022-11897-z.
7
SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins.SCORPION 是一个基于堆叠的集成学习框架,用于准确预测噬菌体病毒蛋白。
Sci Rep. 2022 Mar 8;12(1):4106. doi: 10.1038/s41598-022-08173-5.
8
iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion.iTTCA-MFF:基于多特征融合的肿瘤 T 细胞抗原识别。
Immunogenetics. 2022 Oct;74(5):447-454. doi: 10.1007/s00251-022-01258-5. Epub 2022 Mar 5.
9
UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.UMPred-FRL:一种使用特征表示学习准确预测鲜味肽的新方法。
Int J Mol Sci. 2021 Dec 4;22(23):13124. doi: 10.3390/ijms222313124.
10
StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides.StackDPPIV:一种用于准确预测二肽基肽酶 IV(DPP-IV)抑制肽的新型计算方法。
Methods. 2022 Aug;204:189-198. doi: 10.1016/j.ymeth.2021.12.001. Epub 2021 Dec 6.