Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Cell Syst. 2022 Jul 20;13(7):561-573.e5. doi: 10.1016/j.cels.2022.06.001. Epub 2022 Jul 6.
The development of new vaccines, as well as our understanding of key processes that shape viral evolution and host antibody repertoires, relies on measuring multiple antibody responses against large panels of viruses. Given the enormous diversity of circulating virus strains and antibody responses, comprehensively testing all antibody-virus interactions is infeasible. Even within individual studies with limited panels, exhaustive testing is not always performed, and there is no common framework for combining information across studies with partially overlapping panels, especially when the assay type or host species differ. Prior studies have demonstrated that antibody-virus interactions can be characterized in a vastly simpler and lower dimensional space, suggesting that relatively few measurements could predict unmeasured antibody-virus interactions. Here, we apply matrix completion to several large-scale influenza and HIV-1 studies. We explore how prediction accuracy evolves as the number of measurements changes and approximates the number of additional measurements necessary in several highly incomplete datasets (suggesting ∼250,000 measurements could be saved). In addition, we show how the method can combine disparate datasets, even when the number of available measurements is below the theoretical limit that guarantees successful prediction. This approach can be readily generalized to other viruses or more broadly to other low-dimensional biological datasets.
新疫苗的开发,以及我们对塑造病毒进化和宿主抗体库的关键过程的理解,都依赖于测量针对大量病毒的多种抗体反应。鉴于循环病毒株和抗体反应的巨大多样性,全面测试所有抗体-病毒相互作用是不可行的。即使在面板有限的个别研究中,也并非总是进行详尽的测试,并且在具有部分重叠面板的研究之间结合信息也没有通用的框架,特别是当检测类型或宿主物种不同时。先前的研究表明,抗体-病毒相互作用可以在一个简单得多的低维空间中进行表征,这表明相对较少的测量值就可以预测未测量的抗体-病毒相互作用。在这里,我们将矩阵补全应用于几项大规模的流感和 HIV-1 研究。我们探讨了随着测量次数的变化,预测准确性如何演变,并在几个高度不完整的数据集(表明可能需要大约 250,000 次测量)中逼近所需的额外测量次数。此外,我们展示了该方法如何组合不同的数据集,即使可用的测量次数低于保证成功预测的理论极限。这种方法可以很容易地推广到其他病毒或更广泛的低维生物数据集。