Department of Microbiology and Immunology, University of Rochester Medical Center, Rochester, NY 14642, USA.
J Immunol. 2012 May 1;188(9):4235-48. doi: 10.4049/jimmunol.1103640. Epub 2012 Mar 30.
The ability to track CD4 T cells elicited in response to pathogen infection or vaccination is critical because of the role these cells play in protective immunity. Coupled with advances in genome sequencing of pathogenic organisms, there is considerable appeal for implementation of computer-based algorithms to predict peptides that bind to the class II molecules, forming the complex recognized by CD4 T cells. Despite recent progress in this area, there is a paucity of data regarding the success of these algorithms in identifying actual pathogen-derived epitopes. In this study, we sought to rigorously evaluate the performance of multiple Web-available algorithms by comparing their predictions with our results--obtained by purely empirical methods for epitope discovery in influenza that used overlapping peptides and cytokine ELISPOTs--for three independent class II molecules. We analyzed the data in different ways, trying to anticipate how an investigator might use these computational tools for epitope discovery. We come to the conclusion that currently available algorithms can indeed facilitate epitope discovery, but all shared a high degree of false-positive and false-negative predictions. Therefore, efficiencies were low. We also found dramatic disparities among algorithms and between predicted IC(50) values and true dissociation rates of peptide-MHC class II complexes. We suggest that improved success of predictive algorithms will depend less on changes in computational methods or increased data sets and more on changes in parameters used to "train" the algorithms that factor in elements of T cell repertoire and peptide acquisition by class II molecules.
能够跟踪针对病原体感染或疫苗接种而产生的 CD4 T 细胞是至关重要的,因为这些细胞在保护性免疫中发挥作用。结合病原体基因组测序的进展,人们强烈呼吁实施基于计算机的算法,以预测与 II 类分子结合形成 CD4 T 细胞识别复合物的肽。尽管在这一领域取得了最近的进展,但关于这些算法在识别实际病原体衍生表位方面的成功的数据却很少。在这项研究中,我们通过将我们的结果(通过使用重叠肽和细胞因子 ELISPOT 纯粹通过经验方法获得的流感表位发现结果)与我们的结果进行比较,来严格评估多个可用的 Web 算法的性能,这些结果来自三个独立的 II 类分子。我们以不同的方式分析数据,试图预测研究人员如何使用这些计算工具进行表位发现。我们得出的结论是,目前可用的算法确实可以促进表位发现,但所有算法都存在高度的假阳性和假阴性预测。因此,效率很低。我们还发现算法之间以及预测的 IC(50)值和肽-MHC II 类复合物的真实解离率之间存在显著差异。我们建议,提高预测算法的成功率将更少地依赖于计算方法的变化或增加数据集,而更多地依赖于用于“训练”算法的参数变化,这些参数考虑了 T 细胞库和 II 类分子获取肽的因素。