Jarchau T, Vogt H
Institut für Genetik und Mikrobiologie der Universität Würzburg, Germany.
J Theor Biol. 1991 Dec 21;153(4):445-53. doi: 10.1016/s0022-5193(05)80149-3.
Overlapping subsequences in a DNA sequence are not independent even if independence is supposed for the single nucleotides. Therefore the often used geometric distribution for the length of restriction fragments is not exact. The exact distribution of this random variable is derived for non-overlapping restriction sites in a DNA sequence with an infinite (or very large) number of nucleotides. Correction to the finite case is easy. It is shown that the simple geometric distribution is a good approximation as long as the basic probability for the occurrence of the recognition sequence at a given site is small.