Song Haoqiu, Tithi Saima Sultana, Brown Connor, Aylward Frank O, Jensen Roderick, Zhang Liqing
Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America.
Department of Cell & Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, United States of America.
PeerJ. 2025 Jan 10;13:e18515. doi: 10.7717/peerj.18515. eCollection 2025.
Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform depth of coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence. Results show that Virseqimprover has good performances on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge.
尽管最近病毒宏基因组学研究激增,但从宏基因组数据中恢复完整病毒基因组仍然是一项重大挑战。从头组装程序生成的大多数病毒重叠群高度碎片化,给下游分析和推断带来了重大挑战。为了解决这个问题,我们开发了Virseqimprover,这是一种计算流程,它可以将组装好的重叠群扩展为完整或近乎完整的基因组,同时保持扩展质量。Virseqimprover首先根据读段覆盖度检查是否存在嵌合序列,如果存在则将序列拆分成片段,然后以均匀的覆盖深度扩展最长片段,并重复这些步骤,直到序列无法再扩展。最后,Virseqimprover注释所得序列的基因内容。结果表明,Virseqimprover在将病毒重叠群校正并扩展至全长方面具有良好性能,因此可以成为提高病毒重叠群完整性并将组装错误降至最低的有用工具。我们向研究界免费提供了Virseqimprover的网络服务器和conda包。