Rinsma I, Hendy M, Penny D
Department of Mathematics, Massey University, Palmerston North, New Zealand.
Bull Math Biol. 1990;52(3):349-58. doi: 10.1007/BF02458576.
When two strings of symbols are aligned it is important to know whether the observed number of matches is better than that expected between two independent sequences with the same frequency of symbols. When strings are of different lengths, nulls need to be inserted in order to align the sequences. One approach is to use simple approximations of sampling for replacement. We describe an algorithm for exactly determining the frequencies of given numbers of matches, sampling without replacement. This does not lead to a simple closed form expression. However, we show examples where sampling with, or without, replacement give very similar results and the simple approach may be adequate for all but the smallest cases.