CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.
Comput Biol Med. 2022 Jul;146:105419. doi: 10.1016/j.compbiomed.2022.105419. Epub 2022 Apr 25.
Data science has been an invaluable part of the COVID-19 pandemic response with multiple applications, ranging from tracking viral evolution to understanding the vaccine effectiveness. Asymptomatic breakthrough infections have been a major problem in assessing vaccine effectiveness in populations globally. Serological discrimination of vaccine response from infection has so far been limited to Spike protein vaccines since whole virion vaccines generate antibodies against all the viral proteins. Here, we show how a statistical and machine learning (ML) based approach can be used to discriminate between SARS-CoV-2 infection and immune response to an inactivated whole virion vaccine (BBV152, Covaxin). For this, we assessed serial data on antibodies against Spike and Nucleocapsid antigens, along with age, sex, number of doses taken, and days since last dose, for 1823 Covaxin recipients. An ensemble ML model, incorporating a consensus clustering approach alongside the support vector machine model, was built on 1063 samples where reliable qualifying data existed, and then applied to the entire dataset. Of 1448 self-reported negative subjects, our ensemble ML model classified 724 to be infected. For method validation, we determined the relative ability of a random subset of samples to neutralize Delta versus wild-type strain using a surrogate neutralization assay. We worked on the premise that antibodies generated by a whole virion vaccine would neutralize wild type more efficiently than delta strain. In 100 of 156 samples, where ML prediction differed from self-reported uninfected status, neutralization against Delta strain was more effective, indicating infection. We found 71.8% subjects predicted to be infected during the surge, which is concordant with the percentage of sequences classified as Delta (75.6%-80.2%) over the same period. Our approach will help in real-world vaccine effectiveness assessments where whole virion vaccines are commonly used.
数据科学在应对 COVID-19 大流行中发挥了不可或缺的作用,其应用范围广泛,从追踪病毒进化到了解疫苗效果等。无症状突破性感染一直是评估全球人群疫苗效果的一个主要问题。到目前为止,血清学区分疫苗反应和感染仅限于 Spike 蛋白疫苗,因为全病毒疫苗会产生针对所有病毒蛋白的抗体。在这里,我们展示了如何使用基于统计和机器学习 (ML) 的方法来区分 SARS-CoV-2 感染和对灭活全病毒疫苗(BBV152,Covaxin)的免疫反应。为此,我们评估了 1823 名 Covaxin 接种者针对 Spike 和核衣壳抗原的抗体以及年龄、性别、接种剂数和末次接种后天数的连续数据。我们在存在可靠定性数据的 1063 个样本上构建了一个包含共识聚类方法和支持向量机模型的集成 ML 模型,然后将其应用于整个数据集。在 1448 名自我报告的阴性受试者中,我们的集成 ML 模型将 724 名分类为感染。为了验证方法,我们使用替代中和测定法确定了一个随机样本子集中和 Delta 与野生型病毒的相对能力。我们的前提是全病毒疫苗产生的抗体将比 delta 株更有效地中和野生型。在 156 个样本中有 100 个 ML 预测与自我报告的未感染状态不同,对 Delta 株的中和作用更有效,表明存在感染。我们发现,在激增期间有 71.8%的预测感染受试者,这与同期分类为 Delta(75.6%-80.2%)的序列百分比一致。我们的方法将有助于在广泛使用全病毒疫苗的现实世界疫苗效果评估中提供帮助。