Laboratory of Plant Pathology-TERRA-Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.
Citrus Research International, Matieland, South Africa.
PeerJ. 2023 Aug 16;11:e15816. doi: 10.7717/peerj.15816. eCollection 2023.
Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research in virology, especially for virus discovery. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS has greatly facilitated the sequencing of new genomes of detected viruses and their comparison. However, bioinformatics analyses allowing reconstruction of genome sequences and detection of single nucleotide polymorphisms (SNPs) can potentially create bias and has not been widely addressed so far. Therefore, more knowledge is required on the limitations of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale performance testing (PT) using three artificially designed datasets. To evaluate the impact of bioinformatics analyses, they were divided into three key steps: reads pre-processing, virus-isolate identification, and variant calling. Each step was evaluated independently through an original, PT design including discussion and validation between participants at each step. Overall, this work underlines key parameters influencing SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the detection were recognized as the most impactful analysis steps for the success of the SNPs detections. Strategies to improve the prediction of SNPs are also discussed.
高通量测序(HTS)技术和生物信息学的最新进展极大地改变了病毒学研究,特别是在病毒发现方面。事实上,要对病毒群体进行适当监测,就需要了解在研究区域中循环的不同分离株的信息。为此,HTS 极大地促进了新检测到的病毒基因组的测序及其比较。然而,允许重建基因组序列和检测单核苷酸多态性(SNP)的生物信息学分析可能会产生偏差,到目前为止还没有得到广泛的解决。因此,需要更多地了解基于 HTS 生成的序列样本预测 SNP 的局限性。为了解决这个问题,我们通过使用三个人工设计的数据集进行大规模性能测试(PT),比较了 14 个植物病毒学实验室在三个样本中检测 21 种 Pepino mosaic virus(PepMV)变体的能力,每个实验室都使用不同的生物信息学管道。为了评估生物信息学分析的影响,我们将其分为三个关键步骤:读取预处理、病毒分离鉴定和变体调用。每个步骤都通过一个原始的 PT 设计进行独立评估,包括参与者在每个步骤的讨论和验证。总的来说,这项工作强调了影响 SNP 检测的关键参数,并为植物病毒可靠的变体调用提出了建议。鉴定最接近的参考序列、映射参数和手动验证检测结果被认为是 SNP 检测成功的最具影响力的分析步骤。还讨论了改进 SNP 预测的策略。