新型 RdRp:一个从宏转录组数据中鉴定各种 RNA 病毒的 RNA 依赖性 RNA 聚合酶的综合数据集。
NeoRdRp: A Comprehensive Dataset for Identifying RNA-dependent RNA Polymerases of Various RNA Viruses from Metatranscriptomic Data.
机构信息
Department of Microbiology and Infection Control, Faculty of Medicine, Osaka Medical and Pharmaceutical University.
Laboratory of Fungal Interaction and Molecular Biology (donated by IFO), Department of Life and Environmental Sciences, University of Tsukuba.
出版信息
Microbes Environ. 2022;37(3). doi: 10.1264/jsme2.ME22001.
RNA viruses are distributed throughout various environments, and most have recently been identified by metatranscriptome sequencing. However, due to the high nucleotide diversity of RNA viruses, it is still challenging to identify novel RNA viruses from metatranscriptome data. To overcome this issue, we created a dataset of RNA-dependent RNA polymerase (RdRp) domains that are essential for all RNA viruses belonging to Orthornavirae. Genes with RdRp domains from various RNA viruses were clustered based on amino acid sequence similarities. A multiple sequence alignment was generated for each cluster, and a hidden Markov model (HMM) profile was created when the number of sequences was greater than three. We further refined 426 HMM profiles by detecting RefSeq RNA virus sequences and subsequently combined the hit sequences with the RdRp domains. As a result, 1,182 HMM profiles were generated from 12,502 RdRp domain sequences, and the dataset was named NeoRdRp. The majority of NeoRdRp HMM profiles successfully detected RdRp domains, specifically in the UniProt dataset. Furthermore, we compared the NeoRdRp dataset with two previously reported methods for RNA virus detection using metatranscriptome sequencing data. Our methods successfully identified the majority of RNA viruses in the datasets; however, some RNA viruses were not detected, similar to the other two methods. NeoRdRp may be repeatedly improved by the addition of new RdRp sequences and is applicable as a system for detecting various RNA viruses from diverse metatranscriptome data.
RNA 病毒分布于各种环境中,其中大多数最近是通过宏转录组测序鉴定的。然而,由于 RNA 病毒的核苷酸多样性较高,从宏转录组数据中鉴定新的 RNA 病毒仍然具有挑战性。为了解决这个问题,我们创建了一个 RNA 依赖性 RNA 聚合酶 (RdRp) 结构域数据集,该数据集对于属于 Orthornavirae 的所有 RNA 病毒都是必需的。基于氨基酸序列相似性对来自各种 RNA 病毒的具有 RdRp 结构域的基因进行聚类。为每个聚类生成多重序列比对,并在序列数大于三个时创建隐马尔可夫模型 (HMM) 轮廓。我们通过检测 RefSeq RNA 病毒序列进一步细化了 426 个 HMM 轮廓,然后将命中序列与 RdRp 结构域结合。结果,从 12,502 个 RdRp 结构域序列中生成了 1,182 个 HMM 轮廓,该数据集命名为 NeoRdRp。大多数 NeoRdRp HMM 轮廓成功检测到 RdRp 结构域,特别是在 UniProt 数据集。此外,我们将 NeoRdRp 数据集与两种以前报道的使用宏转录组测序数据检测 RNA 病毒的方法进行了比较。我们的方法成功地在数据集中识别了大多数 RNA 病毒,但有些 RNA 病毒未被检测到,与其他两种方法相似。通过添加新的 RdRp 序列,NeoRdRp 可以不断改进,并且适用于从各种宏转录组数据中检测各种 RNA 病毒的系统。