Blaisdell B E
Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9. doi: 10.1073/pnas.83.14.5155.
Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable agreement with accepted biological classifications. In addition to similarity by homology, there is also observed similarity of disparate genes in the same organism--for example, all three disparate yeast genes (two enzymes and actin) form a well-distinguished cluster.
对有编码功能和无编码功能的真核细胞核DNA序列集进行一阶和二阶马尔可夫链同质性测定,结果发现,这些相似性是标准的Needleman-Wunsch碱基匹配算法或点阵算法无法察觉的。这些相邻碱基对或三联体分布相似性的测定结果与公认的进化树拓扑结构一致。对30个不同的编码序列的二联体分布进行层次聚类,得到的聚类结果与公认的生物学分类相当吻合。除了同源相似性外,还观察到同一生物体中不同基因间的相似性——例如,酵母中所有三个不同的基因(两种酶和肌动蛋白)形成一个明显不同的聚类。