Reich J G, Drabsch H, Däumler A
Nucleic Acids Res. 1984 Jul 11;12(13):5529-43. doi: 10.1093/nar/12.13.5529.
The statistical behavior of the similarity score for unrelated DNA sequences calculated as letter-by-letter comparison or from various forms of optimal alignment was studied. It was found that natural DNA-sequences from a data base and true random sequences show the same statistical behavior in terms of such scores. This makes it possible to adopt a simple criterion for the rejection of fortuitous similarity. It is based on the mean and standard deviation of chance scores whose expected values, depending on chain length, gap penalty and probability of letter coincidence, may be calculated from formulae given in the paper.
研究了通过逐字母比较或各种形式的最佳比对计算得出的不相关DNA序列相似性得分的统计行为。结果发现,来自数据库的天然DNA序列和真正的随机序列在这些得分方面表现出相同的统计行为。这使得采用一种简单的标准来排除偶然相似性成为可能。该标准基于偶然得分的均值和标准差,其期望值取决于链长、空位罚分和字母匹配概率,可根据本文给出的公式计算得出。