• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

少即是多:“切片”测序数据可提高读段解码准确性和从头组装质量。

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

作者信息

Lonardi Stefano, Mirebrahim Hamid, Wanamaker Steve, Alpert Matthew, Ciardo Gianfranco, Duma Denisa, Close Timothy J

机构信息

Department of Computer Science and Engineering, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, Department of Computer Science, Iowa State University, Ames, IA 50011 and Baylor College of Medicine, Houston, TX 77030, USA.

Department of Computer Science and Engineering, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, Department of Computer Science, Iowa State University, Ames, IA 50011 and Baylor College of Medicine, Houston, TX 77030, USA Department of Computer Science and Engineering, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, Department of Computer Science, Iowa State University, Ames, IA 50011 and Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

Bioinformatics. 2015 Sep 15;31(18):2972-80. doi: 10.1093/bioinformatics/btv311. Epub 2015 May 20.

DOI:10.1093/bioinformatics/btv311
PMID:25995232
Abstract

MOTIVATION

As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing.

RESULTS

We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

AVAILABILITY AND IMPLEMENTATION

Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs

CONTACT

stelo@cs.ucr.edu or timothy.close@ucr.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

自70年代DNA测序发明以来,计算生物学家就不得不应对测序深度有限(或不足)情况下的从头基因组组装问题。在这项工作中,我们研究相反的问题,即处理测序深度过高的挑战。

结果

我们在两个领域探索了超深度测序数据的影响:(i)将 reads 解码为细菌人工染色体(BAC)克隆的问题(在我们最近提出的组合池设计背景下),以及(ii)BAC克隆的从头组装问题。使用真实的超深度测序数据,我们表明,当测序深度超过某个阈值时,测序错误会使这两个问题变得越来越难(而不是像无错误数据那样变得更容易),结果是随着数据越来越多,解决方案的质量会下降。对于第一个问题,我们提出了一种基于“分治”的有效解决方案:我们将一个大的数据集“切片”成最优大小的较小样本,独立解码每个切片,然后合并结果。对超过15000个大麦BAC和超过4000个豇豆BAC的实验结果表明,解码质量和最终组装质量有显著提高。对于第二个问题,我们首次表明现代的从头组装器无法利用超深度测序数据。

可用性和实现方式

处理切片和解决解码冲突的Python脚本可从http://goo.gl/YXgdHT获取;软件Hashfilter可从http://goo.gl/MIyZHs下载。

联系方式

stelo@cs.ucr.edu或timothy.close@ucr.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.少即是多:“切片”测序数据可提高读段解码准确性和从头组装质量。
Bioinformatics. 2015 Sep 15;31(18):2972-80. doi: 10.1093/bioinformatics/btv311. Epub 2015 May 20.
2
De novo meta-assembly of ultra-deep sequencing data.从头组装超深度测序数据。
Bioinformatics. 2015 Jun 15;31(12):i9-16. doi: 10.1093/bioinformatics/btv226.
3
Gossamer--a resource-efficient de novo assembler.Gossamer--一种资源高效的从头组装程序。
Bioinformatics. 2012 Jul 15;28(14):1937-8. doi: 10.1093/bioinformatics/bts297. Epub 2012 May 18.
4
FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads.FinisherSC:一种使用长读长进行从头组装升级的重复感知工具。
Bioinformatics. 2015 Oct 1;31(19):3207-9. doi: 10.1093/bioinformatics/btv280. Epub 2015 Jun 3.
5
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.
6
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.
7
BFC: correcting Illumina sequencing errors.BFC:校正Illumina测序错误。
Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.
8
Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.Karect:对下一代测序数据中的替换、插入和缺失错误进行精确校正。
Bioinformatics. 2015 Nov 1;31(21):3421-8. doi: 10.1093/bioinformatics/btv415. Epub 2015 Jul 14.
9
Data-dependent bucketing improves reference-free compression of sequencing reads.数据依赖分桶法可改善测序读数的无参考压缩。
Bioinformatics. 2015 Sep 1;31(17):2770-7. doi: 10.1093/bioinformatics/btv248. Epub 2015 Apr 24.
10
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

引用本文的文献

1
Sequencing Strategy to Ensure Accurate Plasmid Assembly.确保准确质粒组装的测序策略。
ACS Synth Biol. 2024 Dec 20;13(12):4099-4109. doi: 10.1021/acssynbio.4c00539. Epub 2024 Nov 7.
2
PlasCAT: Plasmid Cloud Assembly Tool.PlasCAT:质粒云组装工具。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae299.
3
Sequencing Strategy to Ensure Accurate Plasmid Assembly.确保准确质粒组装的测序策略。
bioRxiv. 2024 Jun 10:2024.03.25.586694. doi: 10.1101/2024.03.25.586694.
4
Rapid, robust plasmid verification by de novo assembly of short sequencing reads.通过从头组装短测序读段实现快速、稳健的质粒验证。
Nucleic Acids Res. 2020 Oct 9;48(18):e106. doi: 10.1093/nar/gkaa727.
5
The Genome Sequence of the Octocoral - A Key Resource To Study the Impact of Climate Change in the Mediterranean.八放珊瑚的基因组序列——研究气候变化对地中海影响的关键资源。
G3 (Bethesda). 2020 Sep 2;10(9):2941-2952. doi: 10.1534/g3.120.401371.
6
Studying the gut virome in the metagenomic era: challenges and perspectives.在宏基因组学时代研究肠道病毒组:挑战与展望。
BMC Biol. 2019 Oct 28;17(1):84. doi: 10.1186/s12915-019-0704-y.
7
Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes.基于理论和模拟的分箱宏基因组组装基因组中测序工作量、微生物群落丰富度和多样性之间关系的研究
mSystems. 2019 Sep 17;4(5):e00384-19. doi: 10.1128/mSystems.00384-19.
8
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations.基因组时代微生物生物合成的解读:生物学与实际考量
Mar Drugs. 2017 Jun 6;15(6):165. doi: 10.3390/md15060165.
9
Comparative analysis of de novo assemblers for variation discovery in personal genomes.从头组装程序在个人基因组变异发现中的比较分析。
Brief Bioinform. 2018 Sep 28;19(5):893-904. doi: 10.1093/bib/bbx037.
10
Complete genome sequence of phage vB_PspS-H40/1 (formerly H40/1) that infects sp. strain H40 and is used as biological tracer in hydrological transport studies.噬菌体vB_PspS-H40/1(原H40/1)的全基因组序列,该噬菌体感染嗜热栖热菌属菌株H40,并在水文传输研究中用作生物示踪剂。
Stand Genomic Sci. 2017 Feb 2;12:20. doi: 10.1186/s40793-017-0235-5. eCollection 2017.