Guo Linyuan, Luo Cheng, Zhu Shanfeng
BMC Genomics. 2013;14 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2164-14-S5-S11. Epub 2013 Oct 16.
Computational methods for the prediction of Major Histocompatibility Complex (MHC) class II binding peptides play an important role in facilitating the understanding of immune recognition and the process of epitope discovery. To develop an effective computational method, we need to consider two important characteristics of the problem: (1) the length of binding peptides is highly flexible; and (2) MHC molecules are extremely polymorphic and for the vast majority of them there are no sufficient training data.
We develop a novel string kernel MHC2SK (MHC-II String Kernel) method to measure the similarities among peptides with variable lengths. By considering the distinct features of MHC-II peptide binding prediction problem, MHC2SK differs significantly from the recently developed kernel based method, GS (Generic String) kernel, in the way of computing similarities. Furthermore, we extend MHC2SK to MHC2SKpan for pan-specific MHC-II peptide binding prediction by leveraging the binding data of various MHC molecules.
MHC2SK outperformed GS in allele specific prediction using a benchmark dataset, which demonstrates the effectiveness of MHC2SK. Furthermore, we evaluated the performance of MHC2SKpan using various benckmark data sets from several different perspectives: Leave-one-allele-out (LOO), 5-fold cross validation as well as independent data testing. MHC2SKpan has achieved comparable performance with NetMHCIIpan-2.0 and outperformed NetMHCIIpan-1.0, TEPITOPEpan and MultiRTA, being statistically significant. MHC2SKpan can be freely accessed at http://datamining-iip.fudan.edu.cn/service/MHC2SKpan/index.html.
主要组织相容性复合体(MHC)II类结合肽预测的计算方法在促进免疫识别理解和表位发现过程中发挥着重要作用。为开发一种有效的计算方法,我们需要考虑该问题的两个重要特征:(1)结合肽的长度具有高度灵活性;(2)MHC分子极其多态,并且对于绝大多数MHC分子而言,没有足够的训练数据。
我们开发了一种新颖的字符串核MHC2SK(MHC-II字符串核)方法来测量不同长度肽之间的相似性。通过考虑MHC-II肽结合预测问题的独特特征,MHC2SK在计算相似性的方式上与最近开发的基于核的方法GS(通用字符串)核有显著差异。此外,我们通过利用各种MHC分子的结合数据将MHC2SK扩展为MHC2SKpan,用于泛特异性MHC-II肽结合预测。
在使用基准数据集进行等位基因特异性预测时,MHC2SK的表现优于GS,这证明了MHC2SK的有效性。此外,我们从几个不同角度使用各种基准数据集评估了MHC2SKpan的性能:留一等位基因法(LOO)、5折交叉验证以及独立数据测试。MHC2SKpan取得了与NetMHCIIpan-2.0相当的性能,并且优于NetMHCIIpan-1.0、TEPITOPEpan和MultiRTA,具有统计学意义。可通过http://datamining-iip.fudan.edu.cn/service/MHC2SKpan/index.html免费访问MHC2SKpan。