Department of Informatics, Systems and Communication, University of Milano-Bicocca, Viale Sarca 336, 20136, Milan, Italy.
Department of Computational Biology, Institut Pasteur, 25-28 Rue du Dr Roux, 75015, Paris, France.
BMC Bioinformatics. 2022 Apr 19;22(Suppl 15):625. doi: 10.1186/s12859-022-04668-0.
Being able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe.
We present MALVIRUS, an easy-to-install and easy-to-use application that assists users in multiple tasks required for the analysis of a viral population, such as the SARS-CoV-2. MALVIRUS allows to: (1) construct a variant catalog consisting in a set of variations (SNPs/indels) from the population sequences, (2) efficiently genotype and annotate variants of the catalog supported by a read sample, and (3) when the considered viral species is the SARS-CoV-2, assign the input sample to the most likely Pango lineages using the genotyped variations.
Tests on Illumina and Nanopore samples proved the efficiency and the effectiveness of MALVIRUS in analyzing SARS-CoV-2 strain samples with respect to publicly available data provided by NCBI and the more complete dataset provided by GISAID. A comparison with state-of-the-art tools showed that MALVIRUS is always more precise and often have a better recall.
在 COVID-19 大流行期间,能够有效地从每天由多种病毒株产生的大量测序数据中调用变体,对于跟踪病毒株在全球的传播至关重要。
我们提出了 MALVIRUS,这是一个易于安装和使用的应用程序,可帮助用户完成分析病毒群体(如 SARS-CoV-2)所需的多项任务。MALVIRUS 允许:(1)构建一个变体目录,其中包含来自群体序列的一组变异(SNP/indels);(2)对支持读取样本的目录中的变体进行高效的基因分型和注释;(3)当考虑的病毒物种是 SARS-CoV-2 时,使用基因分型的变体将输入样本分配给最可能的 Pango 谱系。
对 Illumina 和 Nanopore 样本的测试证明了 MALVIRUS 在分析 SARS-CoV-2 菌株样本方面的效率和有效性,与 NCBI 提供的公开数据以及 GISAID 提供的更完整数据集相比。与最先进的工具进行比较表明,MALVIRUS 总是更精确,并且通常具有更好的召回率。