Litvin Ulad, Lytras Spyros, Jack Alexander, Robertson David L, Hughes Joseph, Grove Joe
MRC-University of Glasgow Centre for Virus Research, Glasgow, UK.
Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
Mol Syst Biol. 2025 Sep 16. doi: 10.1038/s44320-025-00147-9.
Viruses are genetic parasites of cellular life. Tolerance to genetic change, high mutation rates, adaptations to hosts, and immune escape have driven extensive sequence divergence of viral genes, hampering phylogenetic inference and functional annotation. Protein structure, however, is more conserved, allowing searches for distant homologs and revealing otherwise obscured evolutionary histories. Viruses are underrepresented in current protein structure databases, but this can be addressed by recent advances in machine learning. Using AlphaFold2-ColabFold and ESMFold, we predicted structures for >85,000 proteins from >4400 viruses, expanding viral coverage 30 times compared to experimental structures. Using this data, we map form and function across the human and animal virosphere and examine the evolutionary history of viral class-I fusion glycoproteins, revealing the potential origins of coronavirus spike glycoprotein. Our database, Viro3D ( https://viro3d.cvr.gla.ac.uk/ ), will allow the virology community to fully benefit from the structure prediction revolution, facilitating fundamental molecular virology and structure-informed design of therapies and vaccines.
病毒是细胞生命的基因寄生物。对基因变化的耐受性、高突变率、对宿主的适应性以及免疫逃逸导致了病毒基因广泛的序列分歧,阻碍了系统发育推断和功能注释。然而,蛋白质结构更为保守,这使得寻找远缘同源物成为可能,并揭示了原本模糊不清的进化历史。目前的蛋白质结构数据库中病毒的代表性不足,但机器学习的最新进展可以解决这一问题。利用AlphaFold2-ColabFold和ESMFold,我们预测了来自4400多种病毒的85000多种蛋白质的结构,与实验结构相比,病毒覆盖范围扩大了30倍。利用这些数据,我们绘制了人类和动物病毒圈的形态与功能图谱,并研究了病毒I类融合糖蛋白的进化历史,揭示了冠状病毒刺突糖蛋白的潜在起源。我们的数据库Viro3D(https://viro3d.cvr.gla.ac.uk/)将使病毒学界充分受益于结构预测革命,促进基础分子病毒学以及基于结构的治疗和疫苗设计。