Ndekezi Christian, Byamukama Drake, Kato Frank, Omara Denis, Nakyanzi Angella, Natwijuka Fortunate, Mugaba Susan, Ssekagiri Alfred, Bbosa Nicholas, Sande Obondo James, Kimuda Magambo Phillip, Byarugaba Denis K, Kapaata Anne, Sutar Jyoti, Bhattacharya Jayanta, Kaleebu Pontiano, Balinda Sheila N
Medical Research Council/Uganda Virus Research Institute & London School of Hygiene and Tropical Medicine (MRC), Entebbe, P.O. Box 49, Uganda.
College of Health Sciences, department of Immunology and Molecular Biology, Makerere University, Kampala, P.O. Box 7062, Uganda.
Bioinform Adv. 2025 May 13;5(1):vbaf115. doi: 10.1093/bioadv/vbaf115. eCollection 2025.
Viral genome sequencing and analysis are crucial for understanding the diversity and evolution of viruses. Traditional Sanger sequencing is limited by low sequence depth and is labor intensive. Next-Generation Sequencing (NGS) methods, such as Illumina, offer improved sequencing depth and throughput but face challenges with accurate reconstruction of viral genomes due to genome fragmentation. Third-generation sequencing platforms, such as PacBio and Oxford Nanopore Technologies (ONT), generate long reads with high throughput. However, PacBio is constrained by substantial resource requirements, while ONT suffers from inherently high error rates. Moreover, standardized pipelines for ONT sequencing encompassing basecalling to genome assembly remain limited.
Here, we introduce BonoboFlow, a standardized Nextflow pipeline designed to streamline ONT-based viral genome assembly/haplotype reconstruction. BonoboFlow integrates key processing steps, including basecalling, read filtering, chimeric read removal, error correction, draft genome assembly/haplotype reconstruction, and genome polishing. The pipeline accepts raw POD5 or basecalled FASTQ files as input, produces FASTA consensus files as output, and uses a reference genome (in FASTA format) for contaminant read filtering. BonoboFlow's containerized implementation via Docker and Singularity ensures seamless deployment across diverse computing environments. While BonoboFlow excels in assembling small and medium viral genomes, it showed challenges when reconstructing large viral genomes.
BonoboFlow and corresponding containerized images are publicly available at https://github.com/nchis09/BonoboFlow and https://hub.docker.com/r/nchis09/bonobo_image. The test dataset is available at SRA repository Accession number: PRJNA1137155, http://www.ncbi.nlm.nih.gov/bioproject/1137155.
病毒基因组测序与分析对于理解病毒的多样性和进化至关重要。传统的桑格测序法受限于低序列深度且劳动强度大。新一代测序(NGS)方法,如Illumina,可提供更高的测序深度和通量,但由于基因组片段化,在准确重建病毒基因组方面面临挑战。第三代测序平台,如PacBio和牛津纳米孔技术(ONT),可产生高通量的长读长。然而,PacBio受到大量资源需求的限制,而ONT则存在固有高错误率的问题。此外,涵盖碱基识别到基因组组装的ONT测序标准化流程仍然有限。
在此,我们介绍了BonoboFlow,这是一个标准化的Nextflow流程,旨在简化基于ONT的病毒基因组组装/单倍型重建。BonoboFlow整合了关键处理步骤,包括碱基识别、读段过滤、嵌合读段去除、错误校正、基因组草图组装/单倍型重建以及基因组优化。该流程接受原始的POD5或碱基识别后的FASTQ文件作为输入,生成FASTA一致性文件作为输出,并使用参考基因组(FASTA格式)进行污染读段过滤。通过Docker和Singularity对BonoboFlow进行容器化实现,确保了在不同计算环境中的无缝部署。虽然BonoboFlow在组装中小型病毒基因组方面表现出色,但在重建大型病毒基因组时显示出挑战。
BonoboFlow及相应的容器化镜像可在https://github.com/nchis09/BonoboFlow和https://hub.docker.com/r/nchis09/bonobo_image上公开获取。测试数据集可在SRA存储库中获取,登录号:PRJNA1137155,http://www.ncbi.nlm.nih.gov/bioproject/1137155。