Valieris Renan, Drummond Rodrigo D, Defelicibus Alexandre, Dias-Neto Emmanuel, Rosales Rafael A, Tojal da Silva Israel
Laboratory of Computational Biology and Bioinformatics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil.
Laboratory of Medical Genomics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil.
Bioinformatics. 2022 Mar 28;38(7):1809-1815. doi: 10.1093/bioinformatics/btac047.
Despite of the fast development of highly effective vaccines to control the current COVID-19 pandemics, the unequal distribution and availability of these vaccines worldwide and the number of people infected in the world lead to the continuous emergence of Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) variants of concern. Therefore, it is likely that real-time genomic surveillance will be continuously needed as an unceasing monitoring tool, necessary to follow the spread of the disease and the evolution of the virus. In this context, new genomic variants of SARS-CoV-2, including variants refractory to current vaccines, makes genomic surveillance programs tools of utmost importance. Nevertheless, the lack of appropriate analytical tools to quickly and effectively access the viral composition in meta-transcriptomic sequencing data, including environmental surveillance, represent possible challenges that may impact the fast adoption of this approach to mitigate the spread and transmission of viruses.
We propose a statistical model for the estimation of the relative frequencies of SARS-CoV-2 variants in pooled samples. This model is built by considering a previously defined selection of genomic polymorphisms that characterize SARS-CoV-2 variants. The methods described here support both raw sequencing reads for polymorphisms-based markers calling and predefined markers in the variant call format. Results obtained using simulated data show that our method is quite effective in recovering the correct variant proportions. Further, results obtained by considering longitudinal data from wastewater samples of two locations in Switzerland agree well with those describing the epidemiological evolution of COVID-19 variants in clinical samples of these locations. Our results show that the described method can be a valuable tool for tracking the proportions of SARS-CoV-2 variants in complex mixtures such as waste water and environmental samples.
http://github.com/rvalieris/LCS.
Supplementary data are available at Bioinformatics online.
尽管用于控制当前新冠疫情的高效疫苗发展迅速,但这些疫苗在全球范围内的分配不均和可及性问题,以及全球感染人数众多,导致严重急性呼吸综合征冠状病毒2(SARS-CoV-2)关注变体不断出现。因此,实时基因组监测很可能将持续作为一种不间断的监测工具,对于追踪疾病传播和病毒进化是必不可少的。在此背景下,SARS-CoV-2的新基因组变体,包括对当前疫苗耐药的变体,使得基因组监测计划成为极其重要的工具。然而,缺乏适当的分析工具来快速有效地获取元转录组测序数据中的病毒组成,包括环境监测,这可能是影响快速采用这种方法来减轻病毒传播和传播的潜在挑战。
我们提出了一种用于估计混合样本中SARS-CoV-2变体相对频率的统计模型。该模型是通过考虑先前定义的一组表征SARS-CoV-2变体的基因组多态性构建的。这里描述的方法支持基于多态性标记的原始测序读数调用以及变体调用格式中的预定义标记。使用模拟数据获得的结果表明,我们的方法在恢复正确的变体比例方面相当有效。此外,通过考虑瑞士两个地点废水样本的纵向数据获得的结果与描述这些地点临床样本中新冠病毒变体流行病学演变的结果非常吻合。我们的结果表明,所描述的方法可以成为追踪复杂混合物(如废水和环境样本)中SARS-CoV-2变体比例的有价值工具。
http://github.com/rvalieris/LCS。
补充数据可在《生物信息学》在线获取。