鉴定公共转录组数据集内 RNA 病毒衍生的 RdRp 序列。

Identification of RNA Virus-Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets.

机构信息

Division of Virology, Department of Pathology, Addenbrookes Hospital, University of Cambridge, Cambridge, United Kingdom.

出版信息

Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad060.

Abstract

RNA viruses are abundant and highly diverse and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost-effectively expand the diversity of known RNA virus sequences, we mined publicly available transcriptomic data sets. We developed 77 family-level Hidden Markov Model profiles for the viral RNA-dependent RNA polymerase (RdRp)-the only universal "hallmark" gene of RNA viruses. By using these to search the National Center for Biotechnology Information Transcriptome Shotgun Assembly database, we identified 5,867 contigs encoding RNA virus RdRps or fragments thereof and analyzed their diversity, taxonomic classification, phylogeny, and host associations. Our study expands the known diversity of RNA viruses, and the 77 curated RdRp Profile Hidden Markov Models provide a useful resource for the virus discovery community.

摘要

RNA 病毒丰富多样,广泛感染所有或大多数真核生物。然而,已被编目的 RNA 病毒种类的数量和多样性却非常有限。为了经济有效地扩大已知 RNA 病毒序列的多样性,我们挖掘了公开可用的转录组数据集。我们开发了 77 种用于病毒 RNA 依赖性 RNA 聚合酶(RdRp)的家族级隐马尔可夫模型,这是 RNA 病毒唯一的通用“特征”基因。通过使用这些模型搜索美国国家生物技术信息中心转录组 Shotgun 组装数据库,我们鉴定出了 5867 个编码 RNA 病毒 RdRp 或其片段的序列,并对它们的多样性、分类学分类、系统发育和宿主相关性进行了分析。我们的研究扩展了 RNA 病毒的已知多样性,而这 77 种经过精心整理的 RdRp 蛋白结构域隐马尔可夫模型为病毒发现界提供了一个有用的资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3234/10101049/0ef2007f47f4/msad060f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索