文献检索，用中文搜 PubMed

MOTIVATION

As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing.

RESULTS

We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

AVAILABILITY AND IMPLEMENTATION

Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs

CONTACT

stelo@cs.ucr.edu or timothy.close@ucr.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs

CONTACT

stelo@cs.ucr.edu or timothy.close@ucr.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

动机

自70年代DNA测序发明以来，计算生物学家就不得不应对测序深度有限（或不足）情况下的从头基因组组装问题。在这项工作中，我们研究相反的问题，即处理测序深度过高的挑战。

结果

我们在两个领域探索了超深度测序数据的影响：（i）将 reads 解码为细菌人工染色体（BAC）克隆的问题（在我们最近提出的组合池设计背景下），以及（ii）BAC克隆的从头组装问题。使用真实的超深度测序数据，我们表明，当测序深度超过某个阈值时，测序错误会使这两个问题变得越来越难（而不是像无错误数据那样变得更容易），结果是随着数据越来越多，解决方案的质量会下降。对于第一个问题，我们提出了一种基于“分治”的有效解决方案：我们将一个大的数据集“切片”成最优大小的较小样本，独立解码每个切片，然后合并结果。对超过15000个大麦BAC和超过4000个豇豆BAC的实验结果表明，解码质量和最终组装质量有显著提高。对于第二个问题，我们首次表明现代的从头组装器无法利用超深度测序数据。

可用性和实现方式

处理切片和解决解码冲突的Python脚本可从http://goo.gl/YXgdHT获取；软件Hashfilter可从http://goo.gl/MIyZHs下载。

联系方式

stelo@cs.ucr.edu或timothy.close@ucr.edu

补充信息

补充数据可在《生物信息学》在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

少即是多：“切片”测序数据可提高读段解码准确性和从头组装质量。

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

相似文献

引用本文的文献

少即是多：“切片”测序数据可提高读段解码准确性和从头组装质量。

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献