Suppr超能文献

基于连续读段去除的张量分解进行病毒准种重建。

Viral quasispecies reconstruction via tensor factorization with successive read removal.

机构信息

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i23-i31. doi: 10.1093/bioinformatics/bty291.

Abstract

MOTIVATION

As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small.

RESULTS

This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains.

AVAILABILITY AND IMPLEMENTATION

TenSQR is available at https://github.com/SoYeonA/TenSQR.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着 RNA 病毒发生突变并适应环境变化,通常会对抗病毒疫苗和药物产生耐药性,从而形成病毒株的集合——病毒准种。虽然高通量测序 (HTS) 使对病毒准种的深入研究成为可能,但测序错误和有限的读长使得重建菌株和估计其谱的问题具有挑战性。由于菌株的频率通常不均匀,因此推断病毒准种很困难,当菌株之间的遗传距离较小时,情况会进一步恶化。

结果

本文提出了 TenSQR,这是一种利用张量分解框架来分析 HTS 数据并重建以其成分高度不均匀频率为特征的病毒准种的算法。从根本上讲,TenSQR 通过连续的数据删除执行聚类,以按照从最丰富到最不丰富的顺序推断准种中的菌株;每次推断出一个菌株,就会从数据集中删除该菌株生成的测序reads。所提出的连续菌株重建和数据删除可用于在群体中发现稀有菌株,并有助于检测此类菌株中的缺失。在模拟数据集上的结果表明,TenSQR 可以重建具有广泛不同丰度的全长菌株,通常在多样性为 1-10%时优于最先进的方法,并能检测到稀有菌株中的长缺失。对真实 HIV-1 数据集的研究表明,TenSQR 在实验环境中的表现优于竞争方法。最后,我们应用 TenSQR 来分析 Zika 病毒样本并重建其中包含的全长菌株。

可用性和实现

TenSQR 可在 https://github.com/SoYeonA/TenSQR 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bec/6022648/1089296d1dba/bty291f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验