Karlin S, Ghandour G
Proc Natl Acad Sci U S A. 1985 Sep;82(17):5800-4. doi: 10.1073/pnas.82.17.5800.
Four categories of data representations are used to help interpret structures and similarities of nucleic acid and protein sequences. Statistical significance of the observed relationships revealed by these representations are assessed by a hierarchy of permutation procedures and by comparisons with theoretical random models. Applications are presented for various DNA sequences including papovaviruses, Epstein-Barr virus, mitochondrial genomes, and several globin and immunoglobulin genes.
四类数据表示法用于帮助解释核酸和蛋白质序列的结构及相似性。通过一系列置换程序以及与理论随机模型进行比较,来评估这些表示法所揭示的观察到的关系的统计显著性。文中展示了各种DNA序列的应用情况,包括乳头瘤病毒、爱泼斯坦-巴尔病毒、线粒体基因组以及多个珠蛋白和免疫球蛋白基因。