School of Computer, Wuhan University, Wuhan, China.
PLoS One. 2012;7(8):e43575. doi: 10.1371/journal.pone.0043575. Epub 2012 Aug 21.
The conformational B-cell epitopes are the specific sites on the antigens that have immune functions. The identification of conformational B-cell epitopes is of great importance to immunologists for facilitating the design of peptide-based vaccines. As an attempt to narrow the search for experimental validation, various computational models have been developed for the epitope prediction by using antigen structures. However, the application of these models is undermined by the limited number of available antigen structures. In contrast to the most of available structure-based methods, we here attempt to accurately predict conformational B-cell epitopes from antigen sequences.
In this paper, we explore various sequence-derived features, which have been observed to be associated with the location of epitopes or ever used in the similar tasks. These features are evaluated and ranked by their discriminative performance on the benchmark datasets. From the perspective of information science, the combination of various features can usually lead to better results than the individual features. In order to build the robust model, we adopt the ensemble learning approach to incorporate various features, and develop the ensemble model to predict conformational epitopes from antigen sequences.
Evaluated by the leave-one-out cross validation, the proposed method gives out the mean AUC scores of 0.687 and 0.651 on two datasets respectively compiled from the bound structures and unbound structures. When compared with publicly available servers by using the independent dataset, our method yields better or comparable performance. The results demonstrate the proposed method is useful for the sequence-based conformational epitope prediction.
The web server and datasets are freely available at http://bcell.whu.edu.cn.
构象 B 细胞表位是抗原上具有免疫功能的特定部位。构象 B 细胞表位的鉴定对于免疫学家来说非常重要,有助于设计基于肽的疫苗。作为缩小实验验证搜索范围的尝试,已经开发了各种基于抗原结构的计算模型来进行表位预测。然而,这些模型的应用受到可用抗原结构数量有限的影响。与大多数可用的基于结构的方法不同,我们在这里尝试从抗原序列中准确预测构象 B 细胞表位。
在本文中,我们探索了各种与表位位置相关或曾经用于类似任务的序列衍生特征。这些特征通过在基准数据集上的判别性能进行评估和排序。从信息科学的角度来看,各种特征的组合通常可以比单个特征产生更好的结果。为了构建稳健的模型,我们采用集成学习方法来结合各种特征,并开发集成模型来从抗原序列中预测构象表位。
通过留一交叉验证评估,该方法在分别从结合结构和非结合结构编译的两个数据集上给出了 0.687 和 0.651 的平均 AUC 得分。通过使用独立数据集与公开可用的服务器进行比较,我们的方法表现出更好或相当的性能。结果表明,该方法可用于基于序列的构象表位预测。
网络服务器和数据集可在 http://bcell.whu.edu.cn 上免费获取。