Suppr超能文献

基于下一代测序的准种重建的组合分析和算法。

Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing.

机构信息

Clinic of Infectious Diseases, Catholic University of Sacred Heart, Rome, Italy.

出版信息

BMC Bioinformatics. 2011 Jan 5;12:5. doi: 10.1186/1471-2105-12-5.

Abstract

BACKGROUND

Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called in-silico recombinants.

RESULTS

The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients.

CONCLUSIONS

The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.

摘要

背景

下一代测序(NGS)为高通量基因组学提供了独特的机会,并且有可能在许多领域替代桑格测序,包括从头测序、重测序、宏基因组学以及感染性病原体(如病毒准种)的特征分析。尽管已经为 NGS 数据开发和完善了用于全基因组组装和基因组变异分析的方法和软件,但使用 NGS 数据重建病毒准种仍然是一个挑战。这种应用将有助于分析与免疫反应和抗逆转录病毒治疗暴露相关的宿主内进化途径。在这里,我们介绍了一组公式,用于对 NGS 重测序实验和准种重建算法给定的准种进行组合分析。我们要求测序片段与参考基因组对齐,并且参考基因组被划分为一组滑动窗口(扩增子)。重建算法基于多项分布的组合,并旨在最小化假变体(称为虚拟重组体)的重建。

结果

重建算法应用于无错误的模拟数据,并重建了很高比例的真实变体,即使在遗传多样性较低的情况下,获得虚拟重组体的机会也很高。对乙型肝炎病毒感染患者的真实 NGS 数据的结果证实了其能够从不同患者中区分不同的病毒变体的能力。

结论

组合分析提供了一种描述,即给定确定的扩增子分区和种群多样性的度量,重建准种的难度。重建算法在考虑模拟数据和真实数据时都表现出良好的性能,即使存在测序错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46c6/3022557/bc7d662e813b/1471-2105-12-5-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验