National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, State Key Laboratory for Biocontrol, School of Medicine, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, China.
Apsara Lab, Alibaba Cloud Intelligence, Alibaba Group, Hangzhou, China.
Cell. 2024 Nov 27;187(24):6929-6942.e16. doi: 10.1016/j.cell.2024.09.027. Epub 2024 Oct 9.
Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.
目前的宏基因组工具可能无法识别高度分化的 RNA 病毒。我们开发了一种深度学习算法,称为 LucaProt,用于在来自全球不同生态系统的 10487 个宏转录组中发现高度分化的 RNA 依赖性 RNA 聚合酶 (RdRP) 序列。LucaProt 整合了序列和预测的结构信息,能够准确检测 RdRP 序列。使用这种方法,我们鉴定了 161979 种潜在的 RNA 病毒物种和 180 种 RNA 病毒超组,包括许多以前研究较少的组,以及异常长度(长达 47250 个核苷酸)和基因组复杂性的 RNA 病毒基因组。这些新的 RNA 病毒的一部分通过 RT-PCR 和 RNA/DNA 测序得到了证实。新发现的 RNA 病毒存在于各种环境中,包括空气、温泉和热液喷口,不同生态系统之间的病毒多样性和丰度差异很大。本研究推进了病毒的发现,强调了病毒圈的规模,并提供了计算工具来更好地记录全球的 RNA 病毒组。