Siegel A F, Roach J C, van den Engh G
Department of Management Science, Finance, and Statistics, University of Washington, Seattle 98195, USA.
J Comput Biol. 1998 Spring;5(1):101-11. doi: 10.1089/cmb.1998.5.101.
Consider a DNA mapping project in which overlap of clones is inferred from multiple complete restriction enzyme digests. Each enzyme cuts each clone randomly into fragments whose lengths are determined with some error. Clones that share fragments with matching lengths could contain a region of overlap. However, common fragment lengths may be due to random coincidence leading to a false overlap declaration. Although the probability of false fragment matching is small, a mapping project involves a large number of clone comparisons. Consequently, erroneous fragment matches can be a serious problem. We use a geometrical probability approach to develop exact integral formulas and first-order approximations for the expected number and variance of classes of fragment pairs that will be identified falsely as matching. We also find exact formulas for the expected value, and variance of the number of true fragment matches. These formulas are useful in comparing different mapping strategies.
考虑一个DNA图谱绘制项目,其中克隆的重叠是通过多个完全限制性内切酶消化来推断的。每种酶将每个克隆随机切割成片段,其长度的测定存在一定误差。共享具有匹配长度片段的克隆可能包含重叠区域。然而,片段长度相同可能是由于随机巧合导致错误的重叠声明。虽然错误片段匹配的概率很小,但图谱绘制项目涉及大量的克隆比较。因此,错误的片段匹配可能是一个严重的问题。我们使用几何概率方法来推导精确的积分公式以及关于被错误识别为匹配的片段对类别的期望数量和方差的一阶近似。我们还找到了真实片段匹配数量的期望值和方差的精确公式。这些公式在比较不同的图谱绘制策略时很有用。