Faculty of Chemistry, Northeast Normal University, Changchun 130024, China.
Molecules. 2011 Jun 16;16(6):4971-93. doi: 10.3390/molecules16064971.
Epitope prediction based on random peptide library screening has become a focus as a promising method in immunoinformatics research. Some novel software and web-based servers have been proposed in recent years and have succeeded in given test cases. However, since the number of available mimotopes with the relevant structure of template-target complex is limited, a systematic evaluation of these methods is still absent. In this study, a new benchmark dataset was defined. Using this benchmark dataset and a representative dataset, five examples of the most popular epitope prediction software products which are based on random peptide library screening have been evaluated. Using the benchmark dataset, in no method did performance exceed a 0.42 precision and 0.37 sensitivity, and the MCC scores suggest that the epitope prediction results of these software programs are greater than random prediction about 0.09-0.13; while using the representative dataset, most of the values of these performance measures are slightly improved, but the overall performance is still not satisfactory. Many test cases in the benchmark dataset cannot be applied to these pieces of software due to software limitations. Moreover chances are that these software products are overfitted to the small dataset and will fail in other cases. Therefore finding the correlation between mimotopes and genuine epitope residues is still far from resolved and much larger dataset for mimotope-based epitope prediction is desirable.
基于随机肽文库筛选的表位预测已成为免疫信息学研究中一种很有前途的方法,成为研究热点。近年来,已经提出了一些新的软件和基于网络的服务器,并在给定的测试案例中取得了成功。然而,由于具有模板-靶标复合物相关结构的模拟表位的数量有限,因此这些方法仍然缺乏系统的评估。在本研究中,定义了一个新的基准数据集。使用该基准数据集和一个代表性数据集,对基于随机肽文库筛选的五种最流行的表位预测软件产品进行了评估。使用基准数据集,在没有一种方法的精度超过 0.42,敏感性超过 0.37,而 MCC 评分表明,这些软件程序的表位预测结果比随机预测高约 0.09-0.13;而使用代表性数据集时,这些性能指标的值略有提高,但整体性能仍然不尽如人意。由于软件限制,基准数据集中的许多测试案例无法应用于这些软件。此外,这些软件产品很可能对小数据集进行了过度拟合,而在其他情况下将失败。因此,找到模拟表位和真正表位残基之间的相关性仍然远未解决,并且希望基于模拟表位的表位预测有更大的数据集。