Scherer S, McPeek M S, Speed T P
Human Genome Center, Lawrence Berkeley Laboratory, Berkeley, CA 94720.
Proc Natl Acad Sci U S A. 1994 Jul 19;91(15):7134-8. doi: 10.1073/pnas.91.15.7134.
Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. We describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of > 1000 nt and human sequences of > 10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. We consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.
大型基因组DNA序列包含具有独特序列组织模式的区域。我们描述了一种基于七阶马尔可夫链的概率对数方法,用于快速识别与由八核苷酸使用情况汇编构建的基因组组织模型不相似的基因组序列。已从长度大于1000 nt的大肠杆菌和酿酒酵母DNA序列以及长度大于10000 nt的人类序列构建了数据库。已在噬菌体、酵母和灵长类动物DNA序列中定位了非典型基因和基因簇。我们考虑了结果统计显著性的标准,对观察到的基因组组织变异提供了可能的解释,并给出了这些方法在DNA序列分析中的其他应用。