Huang Wenli, Yang Guobing, Zhao Xiaojun, Li Zerong
Int J Data Min Bioinform. 2014;10(2):189-205. doi: 10.1504/ijdmb.2014.064015.
In recent years, many machine learning methods have been developed to predict HLA binding peptides. However, because only limited types of descriptors characterising the protein features are included in these approaches, these methods have poor prediction accuracy. In this study, we applied support vector machine methods to predict the peptides that bind to the major histocompatibility complexes Class II molecule HLA-DRBl*0401 using six sets of molecular descriptors characterising the primary structures of the peptides. We found that some feature groups provided good prediction accuracies and the overall accuracies were greater than 95% and some feature groups had poor accuracies of only 50%. The performance was improved significantly by additional feature selection and the overall accuracies from each group or combination of descriptors were greater than 90%. Of note, the inclusion of necessary informative and discriminative descriptors improved the prediction accuracies.
近年来,已经开发了许多机器学习方法来预测HLA结合肽。然而,由于这些方法仅包含有限类型的表征蛋白质特征的描述符,因此这些方法的预测准确性较差。在本研究中,我们应用支持向量机方法,使用六组表征肽一级结构的分子描述符来预测与主要组织相容性复合体II类分子HLA-DRBl*0401结合的肽。我们发现,一些特征组提供了良好的预测准确性,总体准确率大于95%,而一些特征组的准确率较差,仅为50%。通过额外的特征选择,性能得到了显著提高,每个描述符组或组合的总体准确率都大于90%。值得注意的是,纳入必要的信息性和判别性描述符提高了预测准确性。