Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, 46980 Valencia, Spain.
Santa Fe Institute, Santa Fe, NM 87501, USA.
Viruses. 2022 May 23;14(5):1114. doi: 10.3390/v14051114.
The generation of different types of defective viral genomes (DVG) is an unavoidable consequence of the error-prone replication of RNA viruses. In recent years, a particular class of DVGs, those containing long deletions or genome rearrangements, has gain interest due to their potential therapeutic and biotechnological applications. Identifying such DVGs in high-throughput sequencing (HTS) data has become an interesting computational problem. Several algorithms have been proposed to accomplish this goal, though all incur false positives, a problem of practical interest if such DVGs have to be synthetized and tested in the laboratory. We present a metasearch tool, DVGfinder, that wraps the two most commonly used DVG search algorithms in a single workflow for the identification of the DVGs in HTS data. DVGfinder processes the results of ViReMa-a and DI-tector and uses a gradient boosting classifier machine learning algorithm to reduce the number of false-positive events. The program also generates output files in user-friendly HTML format, which can help users to explore the DVGs identified in the sample. We evaluated the performance of DVGfinder compared to the two search algorithms used separately and found that it slightly improves sensitivities for low-coverage synthetic HTS data and DI-tector precision for high-coverage samples. The metasearch program also showed higher sensitivity on a real sample for which a set of copy-backs were previously validated.
不同类型缺陷病毒基因组(DVG)的产生是 RNA 病毒易错复制的必然结果。近年来,由于其潜在的治疗和生物技术应用,一类特殊的 DVG,即含有长缺失或基因组重排的 DVG,引起了人们的兴趣。在高通量测序(HTS)数据中识别此类 DVG 已成为一个有趣的计算问题。已经提出了几种算法来实现这一目标,但所有这些算法都会产生假阳性,这是一个实际问题,如果这些 DVG 必须在实验室中合成和测试。我们提出了一种元搜索工具 DVGfinder,它将两种最常用的 DVG 搜索算法包装在单个工作流程中,用于在 HTS 数据中识别 DVG。DVGfinder 处理 ViReMa-a 和 DI-tector 的结果,并使用梯度提升分类器机器学习算法来减少假阳性事件的数量。该程序还以用户友好的 HTML 格式生成输出文件,这有助于用户探索样本中识别出的 DVG。我们评估了 DVGfinder 与单独使用的两种搜索算法的性能,发现它略微提高了低覆盖率合成 HTS 数据的灵敏度和高覆盖率样本的 DI-tector 精度。元搜索程序在一组先前验证的复制回文的真实样本上也显示出更高的灵敏度。