Ren Jing, Song Jiangning, Ellis John, Li Jinyan
Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia.
College of Computer, National University of Defense Technology, Changsha, 410073, China.
BMC Genomics. 2017 Mar 14;18(Suppl 2):113. doi: 10.1186/s12864-017-3493-0.
The broad heterogeneity of antigen-antibody interactions brings tremendous challenges to the design of a widely applicable learning algorithm to identify conformational B-cell epitopes. Besides the intrinsic heterogeneity introduced by diverse species, extra heterogeneity can also be introduced by various data sources, adding another layer of complexity and further confounding the research.
This work proposed a staged heterogeneity learning method, which learns both characteristics and heterogeneity of data in a phased manner. The method was applied to identify antigenic residues of heterogenous conformational B-cell epitopes based on antigen sequences. In the first stage, the model learns the general epitope patterns of each kind of propensity from a large data set containing computationally defined epitopes. In the second stage, the model learns the heterogenous complementarity of these propensities from a relatively small guided data set containing experimentally determined epitopes. Moreover, we designed an algorithm to cluster the predicted individual antigenic residues into conformational B-cell epitopes so as to provide strong potential for real-world applications, such as vaccine development. With heterogeneity well learnt, the transferability of the prediction model was remarkably improved to handle new data with a high level of heterogeneity. The model has been tested on two data sets with experimentally determined epitopes, and on a data set with computationally defined epitopes. This proposed sequence-based method achieved outstanding performance - about twice that of existing methods, including the sequence-based predictor CBTOPE and three other structure-based predictors.
The proposed method uses only antigen sequence information, and thus has much broader applications.
抗原 - 抗体相互作用的广泛异质性给设计一种广泛适用的学习算法以识别构象性B细胞表位带来了巨大挑战。除了不同物种引入的内在异质性外,各种数据源也会引入额外的异质性,增加了另一层复杂性并进一步混淆了研究。
这项工作提出了一种分阶段异质性学习方法,该方法以分阶段的方式学习数据的特征和异质性。该方法被应用于基于抗原序列识别异质构象性B细胞表位的抗原残基。在第一阶段,模型从包含通过计算定义的表位的大数据集中学习每种倾向的一般表位模式。在第二阶段,模型从包含通过实验确定的表位的相对较小的引导数据集中学习这些倾向的异质互补性。此外,我们设计了一种算法,将预测的单个抗原残基聚类为构象性B细胞表位,以便为疫苗开发等实际应用提供强大潜力。通过很好地学习异质性,预测模型的可转移性得到了显著提高,以处理具有高度异质性的新数据。该模型已在两个具有通过实验确定的表位的数据集以及一个具有通过计算定义的表位的数据集上进行了测试。这种基于序列的方法取得了出色的性能——约为现有方法的两倍,包括基于序列的预测器CBTOPE和其他三种基于结构的预测器。
所提出的方法仅使用抗原序列信息,因此具有更广泛的应用。