Gorban' A N, Popova T G, Sadovskiĭ M G
Zh Obshch Biol. 1994 Jul-Oct;55(4-5):420-30.
A new method of evaluation of similarity between two nucleotide sequences is proposed. It is based on comparing frequency/correlation dictionaries of the sequences under investigation. The dictionary is a set of all strings of various length occurring within the sequences accompanied by their frequencies. The method proposed allows to compare sequences of arbitrary lengths, its advantage is absence of necessity of informal (expert) choice of the best fit. Efficiency of the method is demonstrated by comparison of several human genes and viruses.