• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

朝着简化和精确制定片段组装的方向。

Toward simplifying and accurately formulating fragment assembly.

作者信息

Myers E W

机构信息

Department of Computer Science, University of Arizona, Tucson 85721, USA.

出版信息

J Comput Biol. 1995 Summer;2(2):275-90. doi: 10.1089/cmb.1995.2.275.

DOI:10.1089/cmb.1995.2.275
PMID:7497129
Abstract

The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.

摘要

片段组装问题是指从一组随机采样的片段中重建DNA序列。传统上,该问题的目标是生成包含所有片段作为子串的最短字符串,但在重复目标序列的情况下,此目标会产生过度压缩的答案。在本文中,该问题被重新表述为关于双边柯尔莫哥洛夫-斯米尔诺夫统计量寻找最大似然重建的问题,并且认为这是对该问题更好的表述。接下来,片段组装问题被用图论术语重新表述为寻找具有某些属性的无环子图的问题,并且最短或最大似然的目标也在此框架中重新表述。最后,给出了一系列图约简变换,这些变换显著减小了在该问题的实际实例中要探索的图的大小。这种约简非常重要,因为底层问题是NP难的。在实践中,变换后的问题非常小,以至于简单的分支定界算法能够成功解决它们,从而允许以重叠、方向和距离约束的形式考虑辅助实验信息。

相似文献

1
Toward simplifying and accurately formulating fragment assembly.朝着简化和精确制定片段组装的方向。
J Comput Biol. 1995 Summer;2(2):275-90. doi: 10.1089/cmb.1995.2.275.
2
Short superstrings and the structure of overlapping strings.短超弦与重叠弦的结构
J Comput Biol. 1995 Summer;2(2):307-32. doi: 10.1089/cmb.1995.2.307.
3
A new algorithm for DNA sequence assembly.一种用于DNA序列组装的新算法。
J Comput Biol. 1995 Summer;2(2):291-306. doi: 10.1089/cmb.1995.2.291.
4
Reconstructing strings from substrings.从子串重建字符串。
J Comput Biol. 1995 Summer;2(2):333-53. doi: 10.1089/cmb.1995.2.333.
5
Reconstruction of a string from substring precedence data.
J Comput Biol. 1995 Summer;2(2):371-81. doi: 10.1089/cmb.1995.2.371.
6
Four strikes against physical mapping of DNA.
J Comput Biol. 1995 Spring;2(1):139-52. doi: 10.1089/cmb.1995.2.139.
7
An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.一种带有概率禁忌搜索的不耐烦进化算法,用于通过团发现对图论和集合论中的一些NP难问题进行统一求解。
IEEE Trans Syst Man Cybern B Cybern. 2008 Jun;38(3):645-66. doi: 10.1109/TSMCB.2008.915645.
8
Genome comparison without alignment using shortest unique substrings.使用最短唯一子串进行无需比对的基因组比较。
BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.
9
A polynomial time solvable formulation of multiple sequence alignment.多重序列比对的多项式时间可解公式化表述。
J Comput Biol. 2006 Mar;13(2):309-19. doi: 10.1089/cmb.2006.13.309.
10
Likelihood DNA sequencing by hybridization.通过杂交进行似然性DNA测序。
J Biomol Struct Dyn. 1993 Dec;11(3):637-53. doi: 10.1080/07391102.1993.10508020.

引用本文的文献

1
Defining and cataloging variants in pangenome graphs.定义和编目泛基因组图谱中的变异体。
bioRxiv. 2025 Aug 4:2025.08.04.668502. doi: 10.1101/2025.08.04.668502.
2
Geometric deep learning framework for de novo genome assembly.用于从头基因组组装的几何深度学习框架。
Genome Res. 2025 Apr 14;35(4):839-849. doi: 10.1101/gr.279307.124.
3
Telomere-to-telomere assembly by preserving contained reads.通过保留包含的读数进行端粒到端粒组装。
Genome Res. 2024 Nov 20;34(11):1908-1918. doi: 10.1101/gr.279311.124.
4
Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond.测序生物信息学算法及其他方面的理论分析
Commun ACM. 2023 Jul;66(7):118-125. doi: 10.1145/3571723. Epub 2023 Jun 22.
5
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
6
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装
ArXiv. 2023 Aug 15:arXiv:2308.07877v1.
7
Coverage-preserving sparsification of overlap graphs for long-read assembly.重叠图的覆盖保持稀疏化用于长读长组装。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad124.
8
Evolution of complex genome architecture in gymnosperms.裸子植物复杂基因组结构的演化。
Gigascience. 2022 Aug 10;11. doi: 10.1093/gigascience/giac078.
9
Using de novo assembly to identify structural variation of eight complex immune system gene regions.利用从头组装鉴定八个复杂免疫系统基因区域的结构变异。
PLoS Comput Biol. 2021 Aug 3;17(8):e1009254. doi: 10.1371/journal.pcbi.1009254. eCollection 2021 Aug.
10
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.