Suppr超能文献

合成群落的Hi-C基准测试为病毒-宿主推断提供了一个基线。

Synthetic community Hi-C benchmarking provides a baseline for virus-host inferences.

作者信息

Shatadru Rokaiya Nurani, Solonenko Natalie E, Sun Christine L, Sullivan Matthew B

出版信息

bioRxiv. 2025 Apr 23:2025.02.12.637985. doi: 10.1101/2025.02.12.637985.

Abstract

Microbiomes influence diverse ecosystems, but viruses increasingly appear to impose key constraints. While viromics has expanded genomic catalogs, host identification for these viruses remains challenging due to the limitations in scaling cultivation-based approaches and the uncertain reliability and relative low resolution of predictions - particularly for understudied viral taxa. Towards this, Hi-C proximity ligation uses sequenced, cross-linked virus and host genomic fragments to infer virus-host linkages and has now been applied in at least nine studies. However, its accuracy remains unknown. Here we assess Hi-C performance in recovering virus-host interactions using synthetic communities (SynComs) composed of four bacterial strains and nine phages with known interactions and then apply optimized protocols to natural soil samples. In SynComs, standard Hi-C sample preparations and analyses showed poor normalized linkage score performance (26% specificity, 100% sensitivity, incorrect matches up to class level) that could be dramatically improved by Z-score filtering (Z ≥ 0.5, 99% specificity), though at reduced sensitivity (62% down from 100%). Detection limits were established as reproducibility was poor below minimal phage abundances of 10 PFU/mL. Applying optimized protocols to natural soil samples, we compared Hi-C inferred virus-linkages with bioinformatic predictions. Prior to Z-score thresholding, agreement was relatively high at the phylum to family levels (72%), but not at the genus (43%) or species (15%) levels. Z-score thresholding reduced sensitivity (only 34% of predictions were retained), with only modest improvements in congruence with bioinformatic methods (48% or 18% at genus or species levels, respectively). Regardless, this led to 79 genus-level-congruent virus-host linkages and 293 new ones revealed by Hi-C alone - i.e., providing many new virus-host interactions to explore in already well-studied climate-critical soils. Overall, these findings provide empirical benchmarks and methodological guidelines to improve the accuracy and reliability of Hi-C for virus-host linkage studies in complex microbial communities.

摘要

微生物群落影响着多样的生态系统,但病毒似乎越来越多地施加关键限制。虽然病毒组学扩展了基因组目录,但由于基于培养方法的扩展性存在局限,以及预测的可靠性不确定且分辨率相对较低——特别是对于研究不足的病毒分类群,确定这些病毒的宿主仍然具有挑战性。为此,Hi-C 邻近连接法利用测序的、交联的病毒和宿主基因组片段来推断病毒 - 宿主联系,目前至少已应用于九项研究中。然而,其准确性仍然未知。在这里,我们使用由四种细菌菌株和九种已知相互作用的噬菌体组成的合成群落(SynComs)来评估 Hi-C 在恢复病毒 - 宿主相互作用方面的性能,然后将优化后的方案应用于天然土壤样本。在 SynComs 中,标准的 Hi-C 样本制备和分析显示归一化连接分数性能较差(特异性为 26%,敏感性为 100%,错误匹配高达分类级别),通过 Z 分数过滤(Z≥0.5,特异性为 99%)可以显著改善,不过敏感性有所降低(从 100%降至 62%)。由于在低于最小噬菌体丰度 10 PFU/mL 时重现性较差,因此确定了检测限。将优化后的方案应用于天然土壤样本后,我们将 Hi-C 推断的病毒联系与生物信息学预测进行了比较。在进行 Z 分数阈值处理之前,在门到科的水平上一致性相对较高(72%),但在属(43%)或种(15%)的水平上则不然。Z 分数阈值处理降低了敏感性(仅保留了 34%的预测结果),与生物信息学方法的一致性仅略有提高(在属或种的水平上分别为 48%或 18%)。尽管如此,这导致了 79 个属水平一致的病毒 - 宿主联系,以及仅由 Hi-C 揭示的 293 个新联系——即在已经充分研究的对气候至关重要的土壤中提供了许多新的病毒 - 宿主相互作用以供探索。总体而言,这些发现为提高 Hi-C 在复杂微生物群落中进行病毒 - 宿主联系研究的准确性和可靠性提供了实证基准和方法指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee7/12452598/904520cd1848/nihpp-2025.02.12.637985v3-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验