Boostrom Ian, Portal Edward A R, Spiller Owen B, Walsh Timothy R, Sands Kirsty
Division of Infection and Immunity, Department of Medical Microbiology, Cardiff University, Cardiff, United Kingdom.
Department of Zoology, Ineos Oxford Institute for Antimicrobial Research, University of Oxford, Oxford, United Kingdom.
Front Microbiol. 2022 Mar 3;13:796465. doi: 10.3389/fmicb.2022.796465. eCollection 2022.
Long-read sequencing (LRS) can resolve repetitive regions, a limitation of short read (SR) data. Reduced cost and instrument size has led to a steady increase in LRS across diagnostics and research. Here, we re-basecalled FAST5 data sequenced between 2018 and 2021 and analyzed the data in relation to gDNA across a large dataset ( = 200) spanning a wide GC content (25-67%). We examined whether re-basecalled data would improve the hybrid assembly, and, for a smaller cohort, compared long read (LR) assemblies in the context of antimicrobial resistance (AMR) genes and mobile genetic elements. We included a cost analysis when comparing SR and LR instruments. We compared the R9 and R10 chemistries and reported not only a larger yield but increased read quality with R9 flow cells. There were often discrepancies with ARG presence/absence and/or variant detection in LR assemblies. Flye-based assemblies were generally efficient at detecting the presence of ARG on both the chromosome and plasmids. Raven performed more quickly but inconsistently recovered small plasmids, notably a ∼15-kb Col-like plasmid harboring . Canu assemblies were the most fragmented, with genome sizes larger than expected. LR assemblies failed to consistently determine multiple copies of the same ARG as identified by the Unicycler reference. Even with improvements to ONT chemistry and basecalling, long-read assemblies can lead to misinterpretation of data. If LR data are currently being relied upon, it is necessary to perform multiple assemblies, although this is resource (computing) intensive and not yet readily available/useable.
长读长测序(LRS)可以解析重复区域,这是短读长(SR)数据的一个局限性。成本的降低和仪器尺寸的减小导致LRS在诊断和研究中的应用稳步增加。在这里,我们对2018年至2021年期间测序的FAST5数据进行了重新碱基识别,并在一个跨越广泛GC含量(25%-67%)的大型数据集(n = 200)中分析了与基因组DNA(gDNA)相关的数据。我们研究了重新碱基识别的数据是否会改善混合组装,并且对于一个较小的队列,在抗菌药物耐药性(AMR)基因和移动遗传元件的背景下比较了长读长(LR)组装。在比较SR和LR仪器时,我们进行了成本分析。我们比较了R9和R10化学方法,不仅报告了R9流动槽有更高的产量,而且读长质量也有所提高。在LR组装中,ARG的存在/缺失和/或变异检测常常存在差异。基于Flye的组装通常能有效地检测染色体和质粒上ARG的存在。Raven运行速度更快,但在回收小质粒方面不一致,特别是一个携带blaCTX-M-15的约15 kb的Col样质粒。Canu组装的片段化程度最高,基因组大小大于预期。LR组装未能一致地确定与Unicycler参考所识别的相同ARG的多个拷贝。即使ONT化学方法和碱基识别有所改进,长读长组装仍可能导致数据的错误解读。如果目前依赖LR数据,则有必要进行多次组装,尽管这需要大量资源(计算),并且目前还不容易获得/使用。