School of Mathematical Sciences, Dalian University of Technology, Dalian 116023, China; College of Science, Hebei University of Science and Technology, Shijiazhuang 050018, China; School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
J Zhejiang Univ Sci B. 2013 Sep;14(9):816-28. doi: 10.1631/jzus.B1200299.
Proteasomes are responsible for the production of the majority of cytotoxic T lymphocyte (CTL) epitopes. Hence, it is important to identify correctly which peptides will be generated by proteasomes from an unknown protein. However, the pool of proteasome cleavage data used in the prediction algorithms, whether from major histocompatibility complex (MHC) I ligand or in vitro digestion data, is not identical to in vivo proteasomal digestion products. Therefore, the accuracy and reliability of these models still need to be improved. In this paper, three types of proteasomal cleavage data, constitutive proteasome (cCP), immunoproteasome (iCP) in vitro cleavage, and MHC I ligand data, were used for training cleave-site predictive methods based on the kernel-function stabilized matrix method (KSMM). The predictive accuracies of the KSMM+pair coefficients were 75.0%, 72.3%, and 83.1% for cCP, iCP, and MHC I ligand data, respectively, which were comparable to the results from support vector machine (SVM). The three proteasomal cleavage methods were combined in turn with MHC I-peptide binding predictions to model MHC I-peptide processing and the presentation pathway. These integrations markedly improved MHC I peptide identification, increasing area under the receiver operator characteristics (ROC) curve (AUC) values from 0.82 to 0.91. The results suggested that both MHC I ligand and proteasomal in vitro degradation data can give an exact simulation of in vivo processed digestion. The information extracted from cCP and iCP in vitro cleavage data demonstrated that both cCP and iCP are selective in their usage of peptide bonds for cleavage.
蛋白酶体负责产生大多数细胞毒性 T 淋巴细胞 (CTL) 表位。因此,正确识别未知蛋白的蛋白酶体将产生哪些肽是非常重要的。然而,用于预测算法的蛋白酶体切割数据(无论是来自主要组织相容性复合体 (MHC) I 配体还是体外消化数据)与体内蛋白酶体消化产物并不完全相同。因此,这些模型的准确性和可靠性仍需要提高。在本文中,使用了三种类型的蛋白酶体切割数据,即组成型蛋白酶体 (cCP)、免疫蛋白酶体 (iCP) 体外切割和 MHC I 配体数据,用于基于核函数稳定矩阵方法 (KSMM) 训练切割位点预测方法。KSMM+对系数的预测准确率分别为 cCP、iCP 和 MHC I 配体数据的 75.0%、72.3%和 83.1%,与支持向量机 (SVM) 的结果相当。依次将三种蛋白酶体切割方法与 MHC I-肽结合预测相结合,以模拟 MHC I-肽加工和呈递途径。这些整合明显提高了 MHC I 肽的识别能力,使接收者操作特征 (ROC) 曲线下的面积 (AUC) 值从 0.82 增加到 0.91。结果表明,MHC I 配体和蛋白酶体体外降解数据都可以对体内处理的消化进行精确模拟。从 cCP 和 iCP 体外切割数据中提取的信息表明,cCP 和 iCP 在切割肽键时都具有选择性。