快速计算中心字符串。

Swiftly computing center strings.

机构信息

Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena, Germany.

出版信息

BMC Bioinformatics. 2011 Apr 19;12:106. doi: 10.1186/1471-2105-12-106.

DOI:10.1186/1471-2105-12-106

PMID:21504573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3108310/

Abstract

BACKGROUND

The center string (or closest string) problem is a classic computer science problem with important applications in computational biology. Given k input strings and a distance threshold d, we search for a string within Hamming distance at most d to each input string. This problem is NP complete.

RESULTS

In this paper, we focus on exact methods for the problem that are also swift in application. We first introduce data reduction techniques that allow us to infer that certain instances have no solution, or that a center string must satisfy certain conditions. We describe how to use this information to speed up two previously published search tree algorithms. Then, we describe a novel iterative search strategy that is efficient in practice, where some of our reduction techniques can also be applied. Finally, we present results of an evaluation study for two different data sets from a biological application.

CONCLUSIONS

We find that the running time for computing the optimal center string is dominated by the subroutine calls for d = dopt -1 and d = dopt. Our data reduction is very effective for both, either rejecting unsolvable instances or solving trivial positions. We find that this speeds up computations considerably.

摘要

背景

中心串（或最近串）问题是计算机科学中的一个经典问题，在计算生物学中有重要的应用。给定 k 个输入字符串和距离阈值 d，我们搜索与每个输入字符串的汉明距离不超过 d 的字符串。这个问题是 NP 完全的。

结果

在本文中，我们专注于该问题的精确方法，这些方法在应用中也很快捷。我们首先介绍数据约简技术，这些技术允许我们推断某些实例没有解决方案，或者中心串必须满足某些条件。我们描述了如何利用这些信息来加速两个之前发表的搜索树算法。然后，我们描述了一种新颖的迭代搜索策略，在实践中非常有效，其中我们的一些约简技术也可以应用。最后，我们给出了来自生物应用的两个不同数据集的评估研究结果。

结论

我们发现计算最优中心串的运行时间主要取决于 d = dopt -1 和 d = dopt 的子例程调用。对于这两种情况，我们的数据约简都非常有效，可以拒绝不可解的实例或解决简单的位置。我们发现这大大加快了计算速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a59/3108310/53f0d4656de5/1471-2105-12-106-1.jpg

相似文献

Swiftly computing center strings.快速计算中心字符串。

BMC Bioinformatics. 2011 Apr 19;12:106. doi: 10.1186/1471-2105-12-106.

Closest string with outliers.带有异常值的最近字符串。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S55. doi: 10.1186/1471-2105-12-S1-S55.

Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.

On the hardness of counting and sampling center strings.计算和采样中心字符串的难度。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1843-6. doi: 10.1109/TCBB.2012.84.

Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。

BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.

An efficient rank based approach for closest string and closest substring.一种基于排序的高效字符串和子串最近距离方法。

PLoS One. 2012;7(6):e37576. doi: 10.1371/journal.pone.0037576. Epub 2012 Jun 4.

Fast motif recognition via application of statistical thresholds.通过应用统计阈值进行快速基序识别。

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-11-S1-S11.

libFLASM: a software library for fixed-length approximate string matching.libFLASM：一个用于固定长度近似字符串匹配的软件库。

BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.

Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.用于植入式(l, d)基序搜索问题的改进精确枚举算法。

IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.

A hybrid metaheuristic for closest string problem.一种用于最近字符串问题的混合元启发式算法。

Int J Comput Biol Drug Des. 2011;4(3):245-61. doi: 10.1504/IJCBDD.2011.041413. Epub 2011 Jul 21.

本文引用的文献

Computation of median gene clusters.中位数基因簇的计算。

J Comput Biol. 2009 Aug;16(8):1085-99. doi: 10.1089/cmb.2009.0098.

Fast and practical algorithms for planted (l, d) motif search.用于植入式（l, d）基序搜索的快速实用算法。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.

Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA.用于从免疫球蛋白cDNA中扩增重链可变区的简并引物设计。

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S9. doi: 10.1186/1471-2105-7-S4-S9.

The society of genes: networks of functional links between genes from comparative genomics.基因的社会：来自比较基因组学的基因间功能联系网络

Genome Biol. 2002 Oct 25;3(11):research0064. doi: 10.1186/gb-2002-3-11-research0064.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

快速计算中心字符串。

Swiftly computing center strings.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献