Department of Mathematics, Bohai University, Jinzhou 121013, People's Republic of China.
J Comput Chem. 2011 Mar;32(4):675-80. doi: 10.1002/jcc.21656. Epub 2010 Oct 1.
A DNA primary sequence is a string consisting of letters on an alphabet Ω = {a, c, g, t}. Based on all of the 2-combinations of the set Ω, here the repetition is allowed, we transform a DNA primary sequence into a special sequence over a set with cardinality 10. With the 10-letter sequence, we associate 10 nonnegative numerical sequences and then derive a 10-component vector by means of a weighted pseudo-entropy, which can reflect the information on elements of a sequence and, especially, the order relation among them. The new quantitative characterization of DNA sequences is sensitive to substitution of the string elements. The examination of the relationship among β-globin genes of 15 species illustrates the utility of the proposed approach.
一个 DNA 一级序列是一个由字母表 Ω = {a, c, g, t} 上的字母组成的字符串。基于 Ω 集合的所有 2-组合,这里允许重复,我们将 DNA 一级序列转换为一个基数为 10 的集合上的特殊序列。通过这个 10 字母序列,我们关联了 10 个非负数值序列,然后通过加权伪熵得到一个 10 分量向量,它可以反映序列元素的信息,特别是它们之间的顺序关系。这种 DNA 序列的新定量特征对字符串元素的替换很敏感。对 15 个物种的β-球蛋白基因之间关系的研究说明了所提出方法的实用性。