• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SOPRA:基于统计优化的配对读取支架算法。

SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

机构信息

Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, USA.

出版信息

BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.

DOI:10.1186/1471-2105-11-345
PMID:20576136
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2909219/
Abstract

BACKGROUND

High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome.

RESULTS

We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors.

CONCLUSIONS

Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.

摘要

背景

高通量测序(HTS)平台在每次运行时都会产生千兆字节的短读(<100bp)数据。虽然这些短读足以满足重测序应用,但从这些读段从头组装中等大小的基因组仍然是一个重大挑战。通过利用配对末端技术,这些限制可以部分克服,该技术提供了基因组上已知距离的一对短读段。

结果

我们开发了 SOPRA,这是一种设计用于利用配对末端/成对末端信息进行短读段组装的工具。该算法的主要重点是选择一个足够大的同时满足的配对末端约束子集,以在输出支架的大小和质量之间取得平衡。支架组装被呈现为与顶点和连接图的边相关的变量的优化问题。该图的顶点是个体支架,边缘是通过配对末端连接的支架之间绘制的。在以前的测序项目中,已经在霰弹枪测序和支架构建的背景下调用了类似的图问题。然而,鉴于 HTS 数据的易错性质以及由于读段较短而带来的根本限制,在当前背景下,早期研究中使用的特定贪婪算法可能会导致质量较差的结果。SOPRA 通过平等对待所有约束来解决优化问题,从而避免了这个问题,解决方案本身表明需要去除有问题的约束(嵌合/重复支架等)。该约束的解决和去除过程一直迭代,直到达到一个核心的一致约束集。对于 SOLiD 测序仪数据,SOPRA 使用动态规划方法来稳健地将颜色空间组装转换为碱基空间。为了评估组装的质量,我们报告无匹配/不匹配错误率以及各种重排错误率。

结论

将 SOPRA 应用于来自细菌基因组的真实数据,我们能够将支架组装成具有显著长度的支架(N50 高达 200kb),并且在组装过程中引入的错误很少。一般来说,这里提出的方法学将允许更好地组装任何类型的配对末端测序数据的支架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/81e6f7e251c6/1471-2105-11-345-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/6278a231bcb0/1471-2105-11-345-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/1c47834d38e4/1471-2105-11-345-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/e5d2a3fbd477/1471-2105-11-345-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/b9625e375422/1471-2105-11-345-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/97f3184298b6/1471-2105-11-345-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/7165179f82fb/1471-2105-11-345-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/506c7bf99850/1471-2105-11-345-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/81e6f7e251c6/1471-2105-11-345-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/6278a231bcb0/1471-2105-11-345-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/1c47834d38e4/1471-2105-11-345-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/e5d2a3fbd477/1471-2105-11-345-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/b9625e375422/1471-2105-11-345-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/97f3184298b6/1471-2105-11-345-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/7165179f82fb/1471-2105-11-345-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/506c7bf99850/1471-2105-11-345-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0768/2909219/81e6f7e251c6/1471-2105-11-345-8.jpg

相似文献

1
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.SOPRA:基于统计优化的配对读取支架算法。
BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.
2
Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.评估使用 Mate-Pairs 解决从头组装的短读 prokaryotic 重复的好处。
BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95.
3
GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.GRASS:一种用于下一代测序组装的通用支架算法。
Bioinformatics. 2012 Jun 1;28(11):1429-37. doi: 10.1093/bioinformatics/bts175. Epub 2012 Apr 6.
4
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
5
EPGA: de novo assembly using the distributions of reads and insert size.EPGA:基于读长和插入片段分布的从头组装。
Bioinformatics. 2015 Mar 15;31(6):825-33. doi: 10.1093/bioinformatics/btu762. Epub 2014 Nov 17.
6
GapFiller: a de novo assembly approach to fill the gap within paired reads.GapFiller:一种从头开始的组装方法,用于填补配对读取中的缺口。
BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S8. doi: 10.1186/1471-2105-13-S14-S8. Epub 2012 Sep 7.
7
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.配对德布鲁因图:一种将配对末端信息整合到基因组组装工具中的新方法。
J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.
8
SCOP: a novel scaffolding algorithm based on contig classification and optimization.SCOP:一种基于重叠群分类和优化的新型支架算法。
Bioinformatics. 2019 Apr 1;35(7):1142-1150. doi: 10.1093/bioinformatics/bty773.
9
Scaffolding pre-assembled contigs using SSPACE.使用 SSPACE 搭建预组装 contigs 的支架。
Bioinformatics. 2011 Feb 15;27(4):578-9. doi: 10.1093/bioinformatics/btq683. Epub 2010 Dec 12.
10
SLIQ: simple linear inequalities for efficient contig scaffolding.SLIQ:用于高效重叠群支架构建的简单线性不等式
J Comput Biol. 2012 Oct;19(10):1162-75. doi: 10.1089/cmb.2011.0263.

引用本文的文献

1
Maptcha: an efficient parallel workflow for hybrid genome scaffolding.Maptcha:一种用于混合基因组支架构建的高效并行工作流程。
BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.
2
Graph-based self-supervised learning for repeat detection in metagenomic assembly.基于图的自监督学习在宏基因组组装中重复序列检测的应用。
Genome Res. 2024 Oct 11;34(9):1468-1476. doi: 10.1101/gr.279136.124.
3
Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing.利用量子计算进行二倍体和多倍体基因组的单倍型解析组装。

本文引用的文献

1
Filtering error from SOLiD Output.从 SOLiD 输出中过滤错误。
Bioinformatics. 2010 Mar 15;26(6):849-50. doi: 10.1093/bioinformatics/btq045.
2
Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.通过使用双碱基编码的短读长、大规模平行连接测序揭示的人类基因组中的序列和结构变异。
Genome Res. 2009 Sep;19(9):1527-41. doi: 10.1101/gr.091868.109. Epub 2009 Jun 22.
3
Genome assembly reborn: recent computational challenges.基因组组装重生:近期的计算挑战
Cell Rep Methods. 2024 May 20;4(5):100754. doi: 10.1016/j.crmeth.2024.100754. Epub 2024 Apr 12.
4
RegScaf: a regression approach to scaffolding.RegScaff:一种用于支架搭建的回归方法。
Bioinformatics. 2022 May 13;38(10):2675-2682. doi: 10.1093/bioinformatics/btac174.
5
Characterization of Isolate Reveals New Prospects in Waste Stream Valorization for Bacterial Cellulose Production.菌株特性揭示了废物流用于细菌纤维素生产的价值化新前景。
Microorganisms. 2021 Oct 26;9(11):2230. doi: 10.3390/microorganisms9112230.
6
SWALO: scaffolding with assembly likelihood optimization.SWALO:具有装配可能性优化的脚手架
Nucleic Acids Res. 2021 Nov 18;49(20):e117. doi: 10.1093/nar/gkab717.
7
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.
8
Sequencing and assembly of the Egyptian buffalo genome.埃及水牛基因组的测序和组装。
PLoS One. 2020 Aug 19;15(8):e0237087. doi: 10.1371/journal.pone.0237087. eCollection 2020.
9
Differential Contribution of the Parental Genomes to a × Hybrid, Inferred by Phenomic, Genomic, and Transcriptomic Analyses, at Different Industrial Stress Conditions.在不同工业应激条件下,通过表型组学、基因组学和转录组学分析推断亲本基因组对×杂交种的差异贡献。
Front Bioeng Biotechnol. 2020 Mar 3;8:129. doi: 10.3389/fbioe.2020.00129. eCollection 2020.
10
Genome structure reveals the diversity of mating mechanisms in x hybrids, and the genomic instability that promotes phenotypic diversity.基因组结构揭示了 x 杂种中交配机制的多样性,以及促进表型多样性的基因组不稳定性。
Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000333.
Brief Bioinform. 2009 Jul;10(4):354-66. doi: 10.1093/bib/bbp026. Epub 2009 May 29.
4
Application of 'next-generation' sequencing technologies to microbial genetics.“下一代”测序技术在微生物遗传学中的应用。
Nat Rev Microbiol. 2009 Apr;7(4):287-96. doi: 10.1038/nrmicro2122.
5
De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads.利用Illumina/Solexa短序列 reads 对丁香假单胞菌丁香致病变种B728a基因组进行从头组装。
FEMS Microbiol Lett. 2009 Feb;291(1):103-11. doi: 10.1111/j.1574-6968.2008.01441.x. Epub 2008 Dec 9.
6
Next-generation DNA sequencing.下一代DNA测序
Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.
7
Gene-boosted assembly of a novel bacterial genome from very short reads.基于极短读段的新型细菌基因组的基因增强组装
PLoS Comput Biol. 2008 Sep 26;4(9):e1000186. doi: 10.1371/journal.pcbi.1000186.
8
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.《天鹅绒:使用德布鲁因图进行从头短读长拼接的算法》
Genome Res. 2008 May;18(5):821-9. doi: 10.1101/gr.074492.107. Epub 2008 Mar 18.
9
ALLPATHS: de novo assembly of whole-genome shotgun microreads.ALLPATHS:全基因组鸟枪法测序短读段的从头组装。
Genome Res. 2008 May;18(5):810-20. doi: 10.1101/gr.7337908. Epub 2008 Mar 13.
10
De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.从头开始的细菌基因组测序:在台式计算机上组装数百万条非常短的读段。
Genome Res. 2008 May;18(5):802-9. doi: 10.1101/gr.072033.107. Epub 2008 Mar 10.