Teimouri Hamid, Ghoreyshi Zahra S, Kolomeisky Anatoly B, George Jason T
Department of Chemistry, Rice University, Houston, TX, 77005, USA.
Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA.
bioRxiv. 2024 Oct 13:2024.10.11.617901. doi: 10.1101/2024.10.11.617901.
T-cell receptors (TCRs) play a critical role in the immune response by recognizing specific ligand peptides presented by major histocompatibility complex (MHC) molecules. Accurate prediction of peptide binding to TCRs is essential for advancing immunotherapy, vaccine design, and understanding mechanisms of autoimmune disorders. This study presents a novel theoretical method that explores the impact of feature selection techniques on enhancing the predictive accuracy of peptide binding models tailored for specific TCRs. To evaluate the universality of our approach across different TCR systems, we utilized a dataset that includes peptide libraries tested against three distinct murine TCRs. A broad range of physicochemical properties, including amino acid composition, dipeptide composition, and tripeptide features, were integrated into the machine learning-based feature selection framework to identify key features contributing to binding affinity. Our analysis reveals that leveraging optimized feature subsets not only simplifies the model complexity but also enhances predictive performance, enabling more precise identification of TCR-peptide interactions. The results of our feature selection method are consistent with findings from hybrid approaches that utilize both sequence and structural data as input as well as experimental data. Our theoretical approach highlights the role of feature selection in peptide-TCR interactions, providing a powerful tool for uncovering the molecular mechanisms of the T-cell response and assisting in the design of more advanced targeted therapeutics.
T细胞受体(TCRs)通过识别主要组织相容性复合体(MHC)分子呈递的特定配体肽,在免疫反应中发挥关键作用。准确预测肽与TCRs的结合对于推进免疫治疗、疫苗设计以及理解自身免疫性疾病的机制至关重要。本研究提出了一种新颖的理论方法,该方法探讨了特征选择技术对提高针对特定TCRs的肽结合模型预测准确性的影响。为了评估我们的方法在不同TCR系统中的通用性,我们使用了一个数据集,该数据集包括针对三种不同小鼠TCRs测试的肽库。广泛的物理化学性质,包括氨基酸组成、二肽组成和三肽特征,被整合到基于机器学习的特征选择框架中,以识别有助于结合亲和力的关键特征。我们的分析表明,利用优化的特征子集不仅简化了模型复杂性,还提高了预测性能,能够更精确地识别TCR-肽相互作用。我们特征选择方法的结果与利用序列和结构数据作为输入以及实验数据的混合方法的结果一致。我们的理论方法突出了特征选择在肽-TCR相互作用中的作用,为揭示T细胞反应的分子机制以及协助设计更先进的靶向治疗提供了一个强大的工具。