一种用于部分消化问题的快速精确序列算法。

A fast exact sequential algorithm for the partial digest problem.

作者信息

Abbas Mostafa M, Bahig Hazem M

机构信息

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.

Computer Science Division, Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, 11566, Egypt.

出版信息

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):510. doi: 10.1186/s12859-016-1365-2.

DOI:10.1186/s12859-016-1365-2

PMID:28155644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5259970/

Abstract

BACKGROUND

Restriction site analysis involves determining the locations of restriction sites after the process of digestion by reconstructing their positions based on the lengths of the cut DNA. Using different reaction times with a single enzyme to cut DNA is a technique known as a partial digestion. Determining the exact locations of restriction sites following a partial digestion is challenging due to the computational time required even with the best known practical algorithm.

RESULTS

In this paper, we introduce an efficient algorithm to find the exact solution for the partial digest problem. The algorithm is able to find all possible solutions for the input and works by traversing the solution tree with a breadth-first search in two stages and deleting all repeated subproblems. Two types of simulated data, random and Zhang, are used to measure the efficiency of the algorithm. We also apply the algorithm to real data for the Luciferase gene and the E. coli K12 genome.

CONCLUSION

Our algorithm is a fast tool to find the exact solution for the partial digest problem. The percentage of improvement is more than 75% over the best known practical algorithm for the worst case. For large numbers of inputs, our algorithm is able to solve the problem in a suitable time, while the best known practical algorithm is unable.

摘要

背景

限制性酶切位点分析涉及在消化过程后通过根据切割后的DNA片段长度重建其位置来确定限制性酶切位点的位置。使用单一酶以不同反应时间切割DNA是一种称为部分消化的技术。由于即使使用最知名的实用算法也需要计算时间，因此在部分消化后确定限制性酶切位点的确切位置具有挑战性。

结果

在本文中，我们介绍了一种高效算法，用于找到部分消化问题的精确解。该算法能够找到输入的所有可能解，其工作方式是在两个阶段通过广度优先搜索遍历解树，并删除所有重复的子问题。使用两种类型的模拟数据，即随机数据和张（Zhang）数据，来衡量该算法的效率。我们还将该算法应用于荧光素酶基因和大肠杆菌K12基因组的真实数据。