Nandy A
Computer Division, Indian Institute of Chemical Biology, Calcutta.
Indian J Biochem Biophys. 1994 Jun;31(3):149-55.
The sequencing of very large genes comprising tens and hundreds of thousands of bases raises important questions of how the data are to be viewed and analysed and what new information and rules on genome characterisation and organisation may exist in nature. Representation of such gene sequences require more compact techniques than the widely used letter-series method; several graphical techniques have been proposed that partially meet this need and reveal new patterns in sequence composition in conserved genes. Investigations using Chaos techniques have displayed self-similarities in the graphical representation of several classes of gene sequences that point to a fractal nature embedded in the sequence organisation in a scale-independent manner. Scale invariance is also seen in studies that have revealed unexpected correlations in gene sequences that imply that a nucleotide at any position influences nucleotides thousands of base positions away. These observations have given rise to a new way of looking at long gene sequences, but questions on the origin of such patterns and correlations and their implications in gene sequences and evolution remain to be resolved. This review provides a brief overview of the recent papers covering the techniques and the issues raised in these investigations on the global characteristics of long DNA sequences.
由数万个碱基组成的非常大的基因的测序引发了一些重要问题,比如如何看待和分析这些数据,以及自然界中可能存在哪些关于基因组特征和组织的新信息和规则。表示这样的基因序列需要比广泛使用的字母序列法更紧凑的技术;已经提出了几种图形技术,这些技术部分满足了这一需求,并揭示了保守基因序列组成中的新模式。使用混沌技术的研究在几类基因序列的图形表示中显示出自相似性,这表明序列组织中存在一种与尺度无关的分形性质。在一些研究中也观察到尺度不变性,这些研究揭示了基因序列中意想不到的相关性,这意味着序列中任何位置的一个核苷酸会影响数千个碱基位置之外的核苷酸。这些观察结果催生了一种看待长基因序列的新方式,但关于这些模式和相关性的起源及其在基因序列和进化中的意义的问题仍有待解决。这篇综述简要概述了最近的一些论文,这些论文涵盖了关于长DNA序列全局特征的这些研究中所涉及的技术和问题。