简化基因组监测：针对 HIV-1 和其他病原性病毒的多菌株混合数据，对长读长组装器进行全面性能评估，以构建用户友好的生物信息学管道。

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.

机构信息

Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand.

出版信息

F1000Res. 2024 May 31;13:556. doi: 10.12688/f1000research.149577.1. eCollection 2024.

DOI:10.12688/f1000research.149577.1

PMID:38984017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11231628/

Abstract

BACKGROUND

Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.

METHODS

We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.

RESULTS

Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.

CONCLUSIONS

The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.

摘要

背景

确定适当的计算要求和软件性能对于高效的基因组监测至关重要。缺乏标准化的基准测试使得软件选择变得复杂，尤其是在资源有限的情况下。

方法

我们开发了一个容器化的基准测试管道，用于评估七种长读长组装器-Canu、GoldRush、MetaFlye、Strainline、HaploDMF、iGDA 和 RVHaplo-用于病毒单倍型重建，使用模拟和实验性的牛津纳米孔测序数据 HIV-1 和其他病毒。在三个计算系统上进行基准测试，以评估每个组装器的性能，使用 QUAST 和 BLASTN 进行质量评估。

结果

我们的研究结果表明，组装器的选择显著影响组装时间，而 CPU 和内存使用的影响最小。组装器的选择也会影响 contigs 的大小，需要至少 2000 个核苷酸的最小读取长度才能进行高质量的组装。4000 个核苷酸的读取长度可以进一步提高质量。Canu 在组装器中效率较高，但不适合多菌株混合物，而 GoldRush 仅产生共识组装。Strainline 和 MetaFlye 适用于宏基因组测序数据，Strainline 需要高内存，MetaFlye 可在低规格机器上运行。在基于参考的组装器中，iGDA 错误率较高，RVHaplo 运行时和准确性最好，但在相似序列下效果不佳，而利用机器学习的 HaploDMF 错误较少，运行时间略长。

结论

使用 Docker 容器化的 HIV-64148 管道便于轻松部署，并提供了从一系列组装器中进行选择的灵活性，以匹配计算系统或研究要求。该工具有助于基因组组装，并提供有关 HIV-1 序列的有价值信息，增强了病毒进化监测和理解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

简化基因组监测：针对 HIV-1 和其他病原性病毒的多菌株混合数据，对长读长组装器进行全面性能评估，以构建用户友好的生物信息学管道。

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

简化基因组监测：针对 HIV-1 和其他病原性病毒的多菌株混合数据，对长读长组装器进行全面性能评估，以构建用户友好的生物信息学管道。

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献