Department of Comparative Biomedical Sciences, Istituto Zooprofilattico Sperimentale delle Venezie (IZSVe), viale dell'Università 10, 35120, Legnaro (PD), Italy.
French Agency for Food, Environmental and Occupational Health & Safety (ANSES), Ploufragan-Plouzané-Niort Laboratory, Viral Genetics and Biosecurity Unit, 22440, Ploufragan, France.
Virol J. 2019 Nov 21;16(1):140. doi: 10.1186/s12985-019-1223-8.
Next generation sequencing (NGS) is becoming widely used among diagnostics and research laboratories, and nowadays it is applied to a variety of disciplines, including veterinary virology. The NGS workflow comprises several steps, namely sample processing, library preparation, sequencing and primary/secondary/tertiary bioinformatics (BI) analyses. The latter is constituted by a complex process extremely difficult to standardize, due to the variety of tools and metrics available. Thus, it is of the utmost importance to assess the comparability of results obtained through different methods and in different laboratories. To achieve this goal, we have organized a proficiency test focused on the bioinformatics components for the generation of complete genome sequences of salmonid rhabdoviruses.
Three partners, that performed virus sequencing using different commercial library preparation kits and NGS platforms, gathered together and shared with each other 75 raw datasets which were analyzed separately by the participants to produce a consensus sequence according to their own bioinformatics pipeline. Results were then compared to highlight discrepancies, and a subset of inconsistencies were investigated more in detail.
In total, we observed 526 discrepancies, of which 39.5% were located at genome termini, 14.1% at intergenic regions and 46.4% at coding regions. Among these, 10 SNPs and 99 indels caused changes in the protein products. Overall reproducibility was 99.94%. Based on the analysis of a subset of inconsistencies investigated more in-depth, manual curation appeared the most critical step affecting sequence comparability, suggesting that the harmonization of this phase is crucial to obtain comparable results. The analysis of a calibrator sample allowed assessing BI accuracy, being 99.983%.
We demonstrated the applicability and the usefulness of BI proficiency testing to assure the quality of NGS data, and recommend a wider implementation of such exercises to guarantee sequence data uniformity among different virology laboratories.
下一代测序(NGS)在诊断和研究实验室中得到了广泛应用,如今已应用于包括兽医病毒学在内的多个学科。NGS 工作流程包括多个步骤,即样品处理、文库制备、测序以及初级/二级/三级生物信息学(BI)分析。后者由一个极其难以标准化的复杂过程组成,这是由于可用的工具和指标种类繁多。因此,评估通过不同方法和在不同实验室获得的结果的可比性至关重要。为了实现这一目标,我们组织了一次专门针对鲑鱼弹状病毒全基因组序列生成的 BI 部分的能力验证。
三个合作伙伴使用不同的商业文库制备试剂盒和 NGS 平台进行病毒测序,他们汇聚在一起并相互分享了 75 个原始数据集,参与者们分别对这些数据集进行分析,根据自己的 BI 工作流程生成一个共识序列。然后比较结果以突出差异,并对一组不一致性进行更详细的调查。
总共观察到 526 个差异,其中 39.5%位于基因组末端,14.1%位于基因间区,46.4%位于编码区。其中,10 个 SNP 和 99 个插入缺失导致蛋白质产物发生变化。整体重现率为 99.94%。基于对更深入调查的不一致子集的分析,人工校对似乎是影响序列可比性的最关键步骤,这表明这一阶段的协调对于获得可比的结果至关重要。对校准样本的分析可以评估 BI 的准确性,达到 99.983%。
我们证明了 BI 能力验证在保证 NGS 数据质量方面的适用性和有用性,并建议更广泛地实施此类练习,以确保不同病毒学实验室之间的序列数据一致性。