Shanghai Public Health Clinical Center, Fudan University, Shanghai, 200032, China.
Shanghai 10th People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
Sci Data. 2020 Jul 6;7(1):212. doi: 10.1038/s41597-020-0555-y.
Antigenicity measurement plays a fundamental role in vaccine design, which requires antigen selection from a large number of mutants. To augment traditional cross-reactivity experiments, computational approaches for predicting the antigenic distance between multiple protein antigens are highly valuable. The performance of in silico models relies heavily on large-scale benchmark datasets, which are scattered among public databases and published articles or reports. Here, we present the first benchmark dataset of protein antigens with experimental evidence to guide in silico antigenicity calculations. This dataset includes (1) standard haemagglutination-inhibition (HI) tests for 3,867 influenza A/H3N2 strain pairs, (2) standard HI tests for 559 influenza virus B strain pairs, and (3) neutralization titres derived from 1,073 Dengue virus strain pairs. All of these datasets were collated and annotated with experimentally validated antigenicity relationships as well as sequence information for the corresponding protein antigens. We anticipate that this work will provide a benchmark dataset for in silico antigenicity prediction that could be further used to assist in epidemic surveillance and therapeutic vaccine design for viruses with variable antigenicity.
抗原性测量在疫苗设计中起着至关重要的作用,这需要从大量突变体中选择抗原。为了增强传统的交叉反应性实验,预测多种蛋白质抗原之间抗原性距离的计算方法具有很高的价值。基于计算机的模型的性能严重依赖于大规模的基准数据集,这些数据集分散在公共数据库和已发表的文章或报告中。在这里,我们提出了第一个具有实验证据的蛋白质抗原基准数据集,用于指导基于计算机的抗原性计算。该数据集包括:(1)3867 对甲型流感病毒 H3N2 株的标准血凝抑制(HI)试验,(2)559 对乙型流感病毒株的标准 HI 试验,以及(3)1073 对登革热病毒株的中和效价。所有这些数据集都经过整理和注释,具有实验验证的抗原性关系以及相应蛋白质抗原的序列信息。我们预计这项工作将提供一个用于基于计算机的抗原性预测的基准数据集,该数据集可进一步用于协助具有可变抗原性的病毒的流行监测和治疗性疫苗设计。