Suppr超能文献

ViPRA-Haplo:利用配对末端测序数据进行病毒群体的从头重建。

ViPRA-Haplo: De Novo Reconstruction of Viral Populations Using Paired End Sequencing Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 May-Jun;21(3):492-500. doi: 10.1109/TCBB.2024.3374595. Epub 2024 Jun 5.

Abstract

We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. The paths generated by ViPRA are an over-estimation of the true contigs. We propose two refinement methods to obtain an optimal set of contigs representing viral haplotypes. The first method clusters paths reconstructed by ViPRA using VSEARCH Deorowicz et al. 2015 based on sequence similarity, while the second method, MLEHaplo, generates a maximum likelihood estimate of viral populations. We evaluated our pipeline on both simulated and real viral quasispecies data from HIV (and real data from SARS-COV-2). Experimental results show that ViPRA-Haplo, although still an overestimation in the number of true contigs, outperforms the existing tool, PEHaplo, providing up to 9% better genome coverage on HIV real data. In addition, ViPRA-Haplo also retains higher diversity of the viral population as demonstrated by the presence of a higher percentage of contigs less than 1000 base pairs (bps), which also contain k-mers with counts less than 100 (representing rarer sequences), which are absent in PEHaplo. For SARS-CoV-2 sequencing data, ViPRA-Haplo reconstructs contigs that cover more than 90% of the reference genome and were able to validate known SARS-CoV-2 strains in the sequencing data.

摘要

我们提出了 ViPRA-Haplo,这是一种从头开始的、针对病毒群体的菌株特异性组装工作流程,可用于从配对末端下一代测序 (NGS) 数据中重建病毒单倍型。所提出的病毒路径重建算法 (ViPRA) 使用reads 的配对信息从reads 的 De Bruijn 图中生成路径的子集。由 ViPRA 生成的路径是真实 contigs 的高估。我们提出了两种改进方法来获得代表病毒单倍型的最佳 contigs 集。第一种方法使用基于序列相似性的 VSEARCH Deorowicz 等人基于序列相似性对 ViPRA 重建的路径进行聚类。2015 年,第二种方法,MLEHaplo,生成病毒群体的最大似然估计。我们在 HIV 的模拟和真实病毒准种数据以及 SARS-COV-2 的真实数据上评估了我们的管道。实验结果表明,尽管 ViPRA-Haplo 仍然高估了真实 contigs 的数量,但它优于现有的工具 PEHaplo,在 HIV 真实数据上提供高达 9%的更好的基因组覆盖率。此外,ViPRA-Haplo 还保留了更高的病毒群体多样性,这表现为存在更高百分比的小于 1000 个碱基对 (bps) 的 contigs,这些 contigs 还包含计数小于 100 的 k-mers (表示更罕见的序列),而这些在 PEHaplo 中是不存在的。对于 SARS-CoV-2 测序数据,ViPRA-Haplo 重建的 contigs 覆盖了参考基因组的 90%以上,并能够验证测序数据中的已知 SARS-CoV-2 菌株。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验