Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY 10021, USA.
Center for Algorithmic Biotechnology, St. Petersburg State University, St. Peterburg 199004, Russia.
Bioinformatics. 2021 Dec 22;38(1):1-8. doi: 10.1093/bioinformatics/btab597.
The COVID-19 pandemic has ignited a broad scientific interest in viral research in general and coronavirus research in particular. The identification and characterization of viral species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble many viruses (including coronaviruses) into a single contig. Coverage variation between datasets and within dataset, presence of close strains, splice variants and contamination set a high bar for assemblers to process viral datasets with diverse properties.
We developed coronaSPAdes, a novel assembler for RNA viral species recovery in general and coronaviruses in particular. coronaSPAdes leverages the knowledge about viral genome structures to improve assembly extending ideas initially implemented in biosyntheticSPAdes. We have shown that coronaSPAdes outperforms existing SPAdes modes and other popular short-read metagenome and viral assemblers in the recovery of full-length RNA viral genomes.
coronaSPAdes version used in this article is a part of SPAdes 3.15 release and is freely available at http://cab.spbu.ru/software/spades.
Supplementary data are available at Bioinformatics online.
COVID-19 大流行引发了人们对病毒研究的广泛科学兴趣,尤其是对冠状病毒研究的兴趣。在自然宿主中鉴定和描述病毒种类通常需要从头组装。然而,现有的基因组、宏基因组和转录组组装器通常无法将许多病毒(包括冠状病毒)组装成单个连续序列。数据集之间和数据集内的覆盖范围变化、密切相关的菌株、剪接变体和污染,为具有不同特性的病毒数据集的处理设置了很高的标准。
我们开发了 coronaSPAdes,这是一种用于一般 RNA 病毒物种恢复的新型组装器,特别是冠状病毒。coronaSPAdes 利用了有关病毒基因组结构的知识来改进组装,扩展了最初在合成 SPAdes 中实现的想法。我们已经表明,coronaSPAdes 在全长 RNA 病毒基因组的恢复方面优于现有的 SPAdes 模式和其他流行的短读长宏基因组和病毒组装器。
本文中使用的 coronaSPAdes 版本是 SPAdes 3.15 版本的一部分,可在 http://cab.spbu.ru/software/spades 上免费获得。
补充数据可在生物信息学在线获得。