Sadovsky Michael G
Division of Russian Academy of Sciences, Institute of Biophysics of Siberian, Akademgorodok, Krasnoyarsk, 660036.
J Biol Phys. 2003 Mar;29(1):23-38. doi: 10.1023/A:1022554613105.
The information capacity of nucleotide sequences is defined through the calculation of specific entropy of their frequency dictionary. The specificentropy of the frequency dictionary is calculated against the reconstructeddictionary; this latter bears the most probable continuations of the shorterstrings. This developed measure allows to distinguish the sequences both from the randons ones, and from those with high level of (rather simple) order. Some implications of the developed methodology in the fields of genetics,bioinformatics, and molecular biology are discussed.
核苷酸序列的信息容量是通过计算其频率字典的比熵来定义的。频率字典的比熵是相对于重建字典计算的;后者包含较短字符串最可能的延续。这种改进的度量方法能够区分随机序列和具有高度(相当简单)有序性的序列。本文还讨论了这种改进方法在遗传学、生物信息学和分子生物学领域的一些应用。