Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany.
Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Schittenhelmstr. 12, 24105 Kiel, Germany.
Brief Bioinform. 2018 May 1;19(3):495-505. doi: 10.1093/bib/bbw122.
Whole-genome sequencing (WGS) is gaining importance in the analysis of bacterial cultures derived from patients with infectious diseases. Existing computational tools for WGS-based identification have, however, been evaluated on previously defined data relying thereby unwarily on the available taxonomic information.Here, we newly sequenced 846 clinical gram-negative bacterial isolates representing multiple distinct genera and compared the performance of five tools (CLARK, Kaiju, Kraken, DIAMOND/MEGAN and TUIT). To establish a faithful 'gold standard', the expert-driven taxonomy was compared with identifications based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) analysis. Additionally, the tools were also evaluated using a data set of 200 Staphylococcus aureus isolates.CLARK and Kraken (with k =31) performed best with 626 (100%) and 193 (99.5%) correct species classifications for the gram-negative and S. aureus isolates, respectively. Moreover, CLARK and Kraken demonstrated highest mean F-measure values (85.5/87.9% and 94.4/94.7% for the two data sets, respectively) in comparison with DIAMOND/MEGAN (71 and 85.3%), Kaiju (41.8 and 18.9%) and TUIT (34.5 and 86.5%). Finally, CLARK, Kaiju and Kraken outperformed the other tools by a factor of 30 to 170 fold in terms of runtime.We conclude that the application of nucleotide-based tools using k-mers-e.g. CLARK or Kraken-allows for accurate and fast taxonomic characterization of bacterial isolates from WGS data. Hence, our results suggest WGS-based genotyping to be a promising alternative to the MS-based biotyping in clinical settings. Moreover, we suggest that complementary information should be used for the evaluation of taxonomic classification tools, as public databases may suffer from suboptimal annotations.
全基因组测序(WGS)在分析传染病患者来源的细菌培养物方面的重要性日益增加。然而,现有的基于 WGS 的鉴定计算工具已经在先前定义的数据上进行了评估,因此无意中依赖于可用的分类学信息。在这里,我们新测序了 846 个临床革兰氏阴性细菌分离株,代表多个不同的属,并比较了五种工具(CLARK、Kaiju、Kraken、DIAMOND/MEGAN 和 TUIT)的性能。为了建立一个忠实的“黄金标准”,专家驱动的分类法与基于基质辅助激光解吸/电离飞行时间(MALDI-TOF)质谱(MS)分析的鉴定进行了比较。此外,还使用了 200 个金黄色葡萄球菌分离株的数据集来评估这些工具。CLARK 和 Kraken(k=31)对革兰氏阴性和金黄色葡萄球菌分离株的正确物种分类分别为 626(100%)和 193(99.5%)。此外,CLARK 和 Kraken 在两个数据集的平均 F-度量值(分别为 85.5/87.9%和 94.4/94.7%)方面表现出最高值,与 DIAMOND/MEGAN(71 和 85.3%)、Kaiju(41.8 和 18.9%)和 TUIT(34.5 和 86.5%)相比。最后,CLARK、Kaiju 和 Kraken 在运行时方面分别比其他工具快 30 到 170 倍。我们得出结论,使用核苷酸基工具(例如 CLARK 或 Kraken)应用 k-mer 可以对 WGS 数据中细菌分离物进行准确快速的分类特征描述。因此,我们的结果表明,基于 WGS 的基因分型是临床环境中基于 MS 的生物分型的一种有前途的替代方法。此外,我们建议应使用补充信息来评估分类分类工具,因为公共数据库可能存在不理想的注释。