Zhu Shanfeng, Udaka Keiko, Sidney John, Sette Alessandro, Aoki-Kinoshita Kiyoko F, Mamitsuka Hiroshi
Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji 611-0011, Japan.
Bioinformatics. 2006 Jul 1;22(13):1648-55. doi: 10.1093/bioinformatics/btl141. Epub 2006 Apr 13.
Various computational methods have been proposed to tackle the problem of predicting the peptide binding ability for a specific MHC molecule. These methods are based on known binding peptide sequences. However, current available peptide databases do not have very abundant amounts of examples and are highly redundant. Existing studies show that MHC molecules can be classified into supertypes in terms of peptide-binding specificities. Therefore, we first give a method for reducing the redundancy in a given dataset based on information entropy, then present a novel approach for prediction by learning a predictive model from a dataset of binders for not only the molecule of interest but also for other MHC molecules.
We experimented on the HLA-A family with the binding nonamers of A1 supertype (HLA-A0101, A2601, A2902, A3002), A2 supertype (A0201, A0202, A0203, A0206, A6802), A3 supertype (A0301, A1101, A3101, A3301, A6801) and A24 supertype (A2301 and A2402), whose data were collected from six publicly available peptide databases and two private sources. The results show that our approach significantly improves the prediction accuracy of peptides that bind a specific HLA molecule when we combine binding data of HLA molecules in the same supertype. Our approach can thus be used to help find new binders for MHC molecules.
已经提出了各种计算方法来解决预测特定MHC分子的肽结合能力的问题。这些方法基于已知的结合肽序列。然而,当前可用的肽数据库没有非常丰富的示例,并且存在高度冗余。现有研究表明,MHC分子可以根据肽结合特异性分为不同的超型。因此,我们首先给出一种基于信息熵减少给定数据集中冗余的方法,然后提出一种新的预测方法,即从不仅针对感兴趣的分子而且针对其他MHC分子的结合剂数据集中学习预测模型。
我们对HLA - A家族进行了实验,使用A1超型(HLA - A0101、A2601、A2902、A3002)、A2超型(A0201、A0202、A0203、A0206、A6802)、A3超型(A0301、A1101、A3101、A3301、A6801)和A24超型(A2301和A2402)的结合九肽,其数据来自六个公开可用的肽数据库和两个私人来源。结果表明,当我们结合同一超型中HLA分子的结合数据时,我们的方法显著提高了与特定HLA分子结合的肽的预测准确性。因此,我们的方法可用于帮助寻找MHC分子的新结合剂。