Medical Genetics Institute, Ho Chi Minh City, Vietnam.
NexCalibur Therapeutics, Wilmington, DE, United States.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad284.
MOTIVATION: Predicting the binding between T-cell receptor (TCR) and peptide presented by human leucocyte antigen molecule is a highly challenging task and a key bottleneck in the development of immunotherapy. Existing prediction tools, despite exhibiting good performance on the datasets they were built with, suffer from low true positive rates when used to predict epitopes capable of eliciting T-cell responses in patients. Therefore, an improved tool for TCR-peptide prediction built upon a large dataset combining existing publicly available data is still needed. RESULTS: We collected data from five public databases (IEDB, TBAdb, VDJdb, McPAS-TCR, and 10X) to form a dataset of >3 million TCR-peptide pairs, 3.27% of which were binding interactions. We proposed epiTCR, a Random Forest-based method dedicated to predicting the TCR-peptide interactions. epiTCR used simple input of TCR CDR3β sequences and antigen sequences, which are encoded by flattened BLOSUM62. epiTCR performed with area under the curve (0.98) and higher sensitivity (0.94) than other existing tools (NetTCR, Imrex, ATM-TCR, and pMTnet), while maintaining comparable prediction specificity (0.9). We identified seven epitopes that contributed to 98.67% of false positives predicted by epiTCR and exerted similar effects on other tools. We also demonstrated a considerable influence of peptide sequences on prediction, highlighting the need for more diverse peptides in a more balanced dataset. In conclusion, epiTCR is among the most well-performing tools, thanks to the use of combined data from public sources and its use will contribute to the quest in identifying neoantigens for precision cancer immunotherapy. AVAILABILITY AND IMPLEMENTATION: epiTCR is available on GitHub (https://github.com/ddiem-ri-4D/epiTCR).
动机:预测 T 细胞受体 (TCR) 与人类白细胞抗原分子呈递的肽之间的结合是一项极具挑战性的任务,也是免疫疗法发展的关键瓶颈。现有的预测工具尽管在其构建的数据集中表现出良好的性能,但在用于预测能够在患者中引发 T 细胞反应的表位时,其真阳性率较低。因此,仍然需要一个基于包含现有公开可用数据的大型数据集构建的改进的 TCR-肽预测工具。
结果:我们从五个公共数据库(IEDB、TBAdb、VDJdb、McPAS-TCR 和 10X)收集数据,形成了一个包含超过 300 万个 TCR-肽对的数据集,其中 3.27%是结合相互作用。我们提出了 epiTCR,这是一种基于随机森林的方法,专门用于预测 TCR-肽相互作用。epiTCR 使用 TCR CDR3β 序列和抗原序列的简单输入,这些序列由展平的 BLOSUM62 编码。epiTCR 的曲线下面积(0.98)和更高的敏感性(0.94)优于其他现有工具(NetTCR、Imrex、ATM-TCR 和 pMTnet),同时保持相当的预测特异性(0.9)。我们确定了七个表位,这些表位对 epiTCR 预测的 98.67%假阳性贡献最大,并对其他工具产生了类似的影响。我们还证明了肽序列对预测的重要影响,这突出表明需要在更平衡的数据集中使用更多样化的肽。总之,epiTCR 是表现最好的工具之一,这要归功于使用来自公共资源的组合数据及其使用将有助于识别用于精准癌症免疫疗法的新抗原。
可用性和实施:epiTCR 可在 GitHub(https://github.com/ddiem-ri-4D/epiTCR)上获得。
Bioinformatics. 2023-5-4
Bioinformatics. 2021-7-12
Nucleic Acids Res. 2018-1-4
IEEE Trans Comput Biol Bioinform. 2025
Bioinformatics. 2025-7-1
Cell Genom. 2025-6-27
Brief Bioinform. 2025-7-2
Brief Bioinform. 2025-5-1
Genomics Proteomics Bioinformatics. 2023-4
Nat Comput Sci. 2021-5
Nat Mach Intell. 2021-10
Bioinformatics. 2021-7-12