Galperina Anastasia, Lugli Gabriele Andrea, Milani Christian, De Vos Willem M, Ventura Marco, Salonen Anne, Hurwitz Bonnie, Ponsero Alise Jany
Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy.
PLoS Comput Biol. 2025 May 2;21(5):e1012268. doi: 10.1371/journal.pcbi.1012268. eCollection 2025 May.
The growing interest in the role of the gut virome in human health and disease, has led to several recent large-scale viral catalogue projects mining human gut metagenomes each using varied computational tools and quality control criteria. Importantly, there has been to date no consistent comparison of these catalogues' quality, diversity, and overlap. In this project, we therefore systematically surveyed nine previously published human gut viral catalogues. While these catalogues collectively screened >40,000 human fecal metagenomes, 82% of the recovered 345,613 viral sequences were unique to one catalogue, highlighting limited redundancy between the ressources and suggesting the need for an aggregated resource bringing these viral sequences together. We further expanded these viral catalogues by mining 7,867 infant gut metagenomes from 12 large-scale infant studies collected in 9 different countries. From these datasets, we constructed the Aggregated Gut Viral Catalogue (AVrC), a unified modular resource containing 1,018,941 dereplicated viral sequences (449,859 species-level vOTUs). Using computational inference tools, annotations were obtained for each vOTU representative sequence quality, viral taxonomy, predicted viral lifestyle, and putative host. This project aims to facilitate the reuse of previously published viral catalogues by the research community and follows a modular framework to enable future expansions as novel data becomes available.
人们对肠道病毒组在人类健康和疾病中的作用越来越感兴趣,这导致最近有几个大规模的病毒目录项目挖掘人类肠道宏基因组,每个项目都使用了不同的计算工具和质量控制标准。重要的是,迄今为止,尚未对这些目录的质量、多样性和重叠性进行一致的比较。因此,在本项目中,我们系统地调查了九个先前发表的人类肠道病毒目录。虽然这些目录总共筛选了超过40000个人类粪便宏基因组,但在回收的345613个病毒序列中,82%是一个目录所独有的,这突出了这些资源之间的冗余有限,并表明需要一个汇总资源来整合这些病毒序列。我们通过挖掘来自9个不同国家的12项大规模婴儿研究中的7867个婴儿肠道宏基因组,进一步扩展了这些病毒目录。从这些数据集中,我们构建了汇总肠道病毒目录(AVrC),这是一个统一的模块化资源,包含1018941个去重的病毒序列(449859个物种水平的病毒操作分类单元)。使用计算推断工具,获得了每个病毒操作分类单元代表性序列质量、病毒分类学、预测的病毒生活方式和假定宿主的注释。本项目旨在促进研究界对先前发表的病毒目录的重用,并遵循模块化框架,以便在有新数据时能够进行未来的扩展。