自适应压缩基因组测序策略的集成分析。

Ensemble analysis of adaptive compressed genome sequencing strategies.

出版信息

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S13. doi: 10.1186/1471-2105-15-S9-S13. Epub 2014 Sep 10.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4221792/

Abstract

BACKGROUND

Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity.

RESULTS

In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource allocation method accommodates a parameter to control that probability.

AVAILABILITY

The squeezambler 2.0 C++ source code is available at http://sourceforge.net/projects/hyda/.

摘要

背景

单细胞分辨率下获取基因组有许多应用，例如在微生物组学研究中。然而，对样本中数百万个细胞进行深度测序和组装的成本非常高。有一种可以挽救的特性，即不需要对每个细胞进行深度测序，就可以捕获所有不同的基因组，因为大多数细胞都是生物复制。从这个意义上说，生物学上重要的样本通常是稀疏的。在本文中，我们提出了一种自适应压缩方法，也称为蒸馏感应，以减少测序工作量来捕获稀疏微生物群落中的所有不同基因组。与组测试不同，组测试中不同事件的数量通常是常数，而稀疏性相当于事件的稀有性，在我们的情况下，稀疏性意味着与数据大小相比，不同事件的稀缺性。此前，我们介绍了这个问题，并提出了一种基于广度优先搜索策略的蒸馏感应解决方案。我们模拟了整个过程，由于其计算强度，我们的能力受到限制，无法研究算法在整个集合中的行为。

结果

在本文中，我们修改了之前的广度优先搜索策略，并引入了深度优先搜索策略。我们没有模拟整个过程，因为对于大量实验来说，这是难以处理的，而是提供了一种动态规划算法来分析整个集合中方法的行为。集合分析算法递归地计算捕获每个不同基因组的概率，以及给定种群分布的总测序核苷酸的期望。我们的结果表明，预期总测序核苷酸数与细胞数的对数成正比，与样本中不同基因组数成正比。错过一个基因组的概率取决于其丰度以及其大小与样本中最大基因组大小的比值。修改后的资源分配方法可以容纳一个参数来控制该概率。

可用性

squeezambler 2.0 C++源代码可在 http://sourceforge.net/projects/hyda/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1afe/4221792/93638ced030b/1471-2105-15-S9-S13-1.jpg

相似文献

Ensemble analysis of adaptive compressed genome sequencing strategies.自适应压缩基因组测序策略的集成分析。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S13. doi: 10.1186/1471-2105-15-S9-S13. Epub 2014 Sep 10.

Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.单细胞基因组测序和从头组装用于稀疏微生物群落。

Bioinformatics. 2013 Oct 1;29(19):2395-401. doi: 10.1093/bioinformatics/btt420. Epub 2013 Aug 5.

Analyzing genome coverage profiles with applications to quality control in metagenomics.分析基因组覆盖度图谱及其在宏基因组学质量控制中的应用。

Bioinformatics. 2013 May 15;29(10):1260-7. doi: 10.1093/bioinformatics/btt147. Epub 2013 Apr 14.

Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis.下一代测序和生物信息学瓶颈：宏基因组数据分析的现状。

Curr Opin Biotechnol. 2012 Feb;23(1):9-15. doi: 10.1016/j.copbio.2011.11.013. Epub 2011 Dec 9.

ERGC: an efficient referential genome compression algorithm.ERGC：一种高效的参考基因组压缩算法。

Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399. Epub 2015 Jul 2.

SEK: sparsity exploiting k-mer-based estimation of bacterial community composition.SEK：基于k-mer的细菌群落组成稀疏性利用估计法

Bioinformatics. 2014 Sep 1;30(17):2423-31. doi: 10.1093/bioinformatics/btu320. Epub 2014 May 7.

COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.可口可乐：利用序列组成、读段覆盖度、共比对和双端读段连接对宏基因组重叠群进行分箱。

Bioinformatics. 2017 Mar 15;33(6):791-798. doi: 10.1093/bioinformatics/btw290.

iDoComp: a compression scheme for assembled genomes.iDoComp：一种用于组装基因组的压缩方案。

Bioinformatics. 2015 Mar 1;31(5):626-33. doi: 10.1093/bioinformatics/btu698. Epub 2014 Oct 24.

Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples.量化和比较多个宏基因组样本中的细菌生长动态。

Nat Methods. 2018 Dec;15(12):1041-1044. doi: 10.1038/s41592-018-0182-0. Epub 2018 Nov 12.

Single cell genome sequencing.单细胞基因组测序。

Curr Opin Biotechnol. 2012 Jun;23(3):437-43. doi: 10.1016/j.copbio.2011.11.018. Epub 2011 Dec 7.

本文引用的文献

Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.单细胞基因组测序和从头组装用于稀疏微生物群落。

Bioinformatics. 2013 Oct 1;29(19):2395-401. doi: 10.1093/bioinformatics/btt420. Epub 2013 Aug 5.

Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome.利用凝胶微滴培养技术完成的近全基因组序列揭示了人类微生物组内种内基因组的巨大多样性。

Genome Res. 2013 May;23(5):878-88. doi: 10.1101/gr.142208.112. Epub 2013 Mar 14.

ART: a next-generation sequencing read simulator.ART：一种新一代测序读模拟程序。

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

How to apply de Bruijn graphs to genome assembly.如何将德布鲁因图应用于基因组组装。

Nat Biotechnol. 2011 Nov 8;29(11):987-91. doi: 10.1038/nbt.2023.

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.基于短读长数据集的高效从头组装单细胞细菌基因组。

Nat Biotechnol. 2011 Sep 18;29(10):915-21. doi: 10.1038/nbt.1966.

Compressed Genotyping.压缩基因分型

IEEE Trans Inf Theory. 2010 Feb;56(2):706-723. doi: 10.1109/TIT.2009.2037043.

A human gut microbial gene catalogue established by metagenomic sequencing.宏基因组测序建立的人类肠道微生物基因目录。

Nature. 2010 Mar 4;464(7285):59-65. doi: 10.1038/nature08821.

Cultivating the uncultured.培养未培养的微生物。

Proc Natl Acad Sci U S A. 2002 Nov 26;99(24):15681-6. doi: 10.1073/pnas.252630999. Epub 2002 Nov 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自适应压缩基因组测序策略的集成分析。

Ensemble analysis of adaptive compressed genome sequencing strategies.

出版信息

BACKGROUND

RESULTS

AVAILABILITY

背景

结果

可用性

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献