Skiena S S, Sundaram G
Dept. of Computer Science, State University of New York, Stony Brook 11794, USA.
Proc Int Conf Intell Syst Mol Biol. 1993;1:362-70.
We present a new practical algorithm to resolve the experimental data of restriction site analysis, which is a common technique for mapping DNA. Specifically, we assert that multiple digests with a single restriction enzyme can provide sufficient information to identify the positions of the restriction sites with high probability. The motivation for the new approach comes from combinatorial results on the number of mutually homeometric sets in one dimension, where two sets of n points are homeometric if the multiset of (n2) distances they determine are the same. Since experimental data contains error, we propose algorithms for reconstructing sets from noisy interpoint distances, including the possibility of missing fragments. We analyze the performance of these algorithms under a reasonable probability distribution, establishing a relative error limit of r = theta (1/n2) beyond which our technique becomes infeasible. Through simulations, we establish that our technique is robust enough to reconstruct data with relative errors of up to 7.0% in the measured fragment lengths for typical problems, which appears sufficient for certain biological applications.
我们提出了一种新的实用算法来解析限制性酶切位点分析的实验数据,这是一种常用于绘制DNA图谱的技术。具体而言,我们断言用单一限制性酶进行多次酶切可以提供足够的信息,以高概率识别限制性酶切位点的位置。这种新方法的动机来自于一维中相互等距集数量的组合结果,其中如果两组n个点所确定的(n²)个距离的多重集相同,则这两组点是等距的。由于实验数据包含误差,我们提出了从有噪声的点间距离重建集合的算法,包括缺失片段的可能性。我们在合理的概率分布下分析了这些算法的性能,确定了相对误差极限r = theta (1/n²),超过该极限我们的技术将变得不可行。通过模拟,我们确定我们的技术足够稳健,对于典型问题,在测量的片段长度中相对误差高达7.0%时仍能重建数据,这对于某些生物学应用似乎已经足够。