Hou Qingzhen, Stringer Bas, Waury Katharina, Capel Henriette, Haydarlou Reza, Xue Fuzhong, Abeln Sanne, Heringa Jaap, Feenstra K Anton
Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, China.
National Institute of Health Data Science of China, Shandong University, Shandong 250002, China.
Bioinformatics. 2021 Oct 25;37(20):3421-3427. doi: 10.1093/bioinformatics/btab321.
Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody.
We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research.
Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/.
Supplementary data are available at Bioinformatics online.
抗体在临床研究和生物技术中发挥着重要作用,其特异性由与抗原表位区域的相互作用决定,抗原表位区域是一种特殊类型的蛋白质 - 蛋白质相互作用(PPI)界面。序列数据的广泛可得性使我们能够从序列中预测表位,以便将耗时的湿实验室实验聚焦于最有前景的表位区域。在此,我们将先前开发的用于同源二聚体和异源二聚体PPI界面的基于序列的预测器进行扩展,以预测有可能与抗体结合的表位残基。
我们从SAbDab数据库收集并整理了一个高质量的表位数据集。当在表位测试集上进行评估时,我们的通用PPI异源二聚体预测器获得了0.666的AUC-ROC。然后我们专门在表位数据集上训练了一个随机森林模型,达到了0.694的AUC。在组合的异源二聚体和表位数据集上进一步训练后,我们的最终预测器在表位测试集上的AUC提高到了0.703。这比基于序列的最佳现有表位预测器BepiPred-2.0更好。在一种已解析的新冠病毒刺突受体结合域的抗体 - 抗原结构上,我们的预测器达到了0.778的AUC。我们将SeRenDIP-CE构象表位预测器添加到了我们的网络服务器中,该服务器使用简单,仅需输入单个抗原序列,这将有助于使该方法立即应用于广泛的生物医学和生物分子研究。
网络服务器、源代码和数据集可在www.ibi.vu.nl/programs/serendipwww/获取。
补充数据可在《生物信息学》在线获取。