Suppr超能文献

朝着简化和精确制定片段组装的方向。

Toward simplifying and accurately formulating fragment assembly.

作者信息

Myers E W

机构信息

Department of Computer Science, University of Arizona, Tucson 85721, USA.

出版信息

J Comput Biol. 1995 Summer;2(2):275-90. doi: 10.1089/cmb.1995.2.275.

Abstract

The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.

摘要

片段组装问题是指从一组随机采样的片段中重建DNA序列。传统上,该问题的目标是生成包含所有片段作为子串的最短字符串,但在重复目标序列的情况下,此目标会产生过度压缩的答案。在本文中,该问题被重新表述为关于双边柯尔莫哥洛夫-斯米尔诺夫统计量寻找最大似然重建的问题,并且认为这是对该问题更好的表述。接下来,片段组装问题被用图论术语重新表述为寻找具有某些属性的无环子图的问题,并且最短或最大似然的目标也在此框架中重新表述。最后,给出了一系列图约简变换,这些变换显著减小了在该问题的实际实例中要探索的图的大小。这种约简非常重要,因为底层问题是NP难的。在实践中,变换后的问题非常小,以至于简单的分支定界算法能够成功解决它们,从而允许以重叠、方向和距离约束的形式考虑辅助实验信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验