Cui J, Han L Y, Lin H H, Zhang H L, Tang Z Q, Zheng C J, Cao Z W, Chen Y Z
Bioinformatics and Drug Design Group, Department of Pharmacy and Department of Computational Science, National University of Singapore, Singapore 117543, Republic of Singapore.
Mol Immunol. 2007 Feb;44(5):866-77. doi: 10.1016/j.molimm.2006.04.001. Epub 2006 Jun 27.
Peptide binding to MHC is critical for antigen recognition by T-cells. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides, which achieve impressive prediction accuracies of 70-90% for binders and 40-80% for non-binders. These methods have been developed for peptides of fixed lengths, for a limited number of alleles, trained from small number of non-binders, and in some cases based straightforwardly on sequence. These limit prediction coverage and accuracy particularly for non-binders. It is desirable to explore methods that predict binders of flexible lengths from sequence-derived physicochemical properties and trained from diverse sets of non-binders. This work explores support vector machines (SVM) as such a method for developing prediction systems of 18 MHC class I and 12 class II alleles by using 4208-3252 binders and 234,333-168,793 non-binders, and evaluated by an independent set of 545-476 binders and 110,564-84,430 non-binders. Binder accuracies are 86-99% for 25 and 70-80% for 5 alleles, non-binder accuracies are 96-99% for 30 alleles. Binder accuracies are comparable and non-binder accuracies substantially improved against other results. Our method correctly predicts 73.3% of the 15 newly-published epitopes in the last 4 months of 2005. Of the 251 recently-published HLA-A*0201 non-epitopes predicted as binders by other methods, 63 are predicted as binders by our method. Screening of HIV-1 genome shows that, compared to other methods, a comparable percentage (75-100%) of its known epitopes is correctly predicted, while a lower percentage (0.01-5% for 24 and 5-8% for 6 alleles) of its constituent peptides are predicted as binders. Our software can be accessed at .
肽与主要组织相容性复合体(MHC)的结合对于T细胞识别抗原至关重要。为了促进疫苗设计,已开发出计算方法来预测MHC结合肽,对于结合肽的预测准确率达到70 - 90%,对于非结合肽的预测准确率达到40 - 80%,令人印象深刻。这些方法是针对固定长度的肽、有限数量的等位基因开发的,从小数量的非结合肽进行训练,并且在某些情况下直接基于序列。这些限制了预测覆盖范围和准确性,特别是对于非结合肽。期望探索从序列衍生的物理化学性质预测灵活长度结合肽且从多样的非结合肽集合进行训练的方法。这项工作探索了支持向量机(SVM)作为这样一种方法,通过使用4208 - 3252个结合肽和234,333 - 168,793个非结合肽来开发18个I类MHC和12个II类等位基因的预测系统,并通过一组独立的545 - 476个结合肽和110,564 - 84,430个非结合肽进行评估。对于25个等位基因,结合肽准确率为86 - 99%,对于5个等位基因,结合肽准确率为70 - 80%,对于30个等位基因,非结合肽准确率为96 - 99%。与其他结果相比,结合肽准确率相当,非结合肽准确率大幅提高。我们的方法正确预测了2005年最后4个月新发表的15个表位中的73.3%。在其他方法预测为结合肽的251个最近发表的HLA - A*0201非表位中,我们的方法将63个预测为结合肽。对HIV - 1基因组的筛选表明,与其他方法相比,其已知表位的正确预测百分比相当(75 - 100%),而其组成肽被预测为结合肽的百分比更低(24个等位基因为0.01 - 5%,6个等位基因为5 - 8%)。可通过……访问我们的软件。