Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain.
Department of Microbiology, School of Medicine, University of Valencia, Valencia, Spain.
BMC Genomics. 2021 Nov 24;22(1):849. doi: 10.1186/s12864-021-08067-2.
Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs.
Reference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes.
Not all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.
具有高突变率的病毒,如诺如病毒和其他 RNA 病毒,或来自宏基因组样本的基因组组装,对科学界来说是一个挑战,因为存在几种病毒准种和株系。此外,在没有相关患者的情况下,没有获得全基因组序列的标准方法。在 8 名急性胃肠炎患者中进行多聚 A RNA 分离和测序后,我们评估了两种 de Bruijn 图组装器(SPAdes 和 MEGAHIT),结合了四种不同的常用预组装策略,并比较了产生全基因组诺如病毒连续体的方法。
与 SPAdes 相比,非过滤数据的组装并没有使基于参考基因组的策略(包括宿主和目标病毒)具有任何优势,而对于 MEGAHIT,只有宿主基因组过滤才有所改进。在大多数情况下,MEGAHIT 的性能优于 SPAdes,对于所有使用的策略,在大多数情况下都能获得完整的基因组序列。与不同的分析策略相结合时,CD-HIT 的读-bin 提高了组装效果,在 SPAdes 的情况下更为显著。
并非所有宏基因组组装都是平等的,工作流程的选择取决于研究的物种和分析的前期步骤。由于宿主内的高度变异性,即使对同等处理的样本,我们可能也需要不同的方法。我们测试并比较了不同的工作流程,以准确组装诺如病毒基因组,并为此建立了它们的组装能力。