School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
J Theor Biol. 2011 Jan 21;269(1):123-30. doi: 10.1016/j.jtbi.2010.10.018. Epub 2010 Oct 20.
In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.
在本文中,我们介绍了三种基于 DNA 碱基三种分类的 DNA 一级序列的三维图形表示,分别称为 RY 曲线、MK 曲线和 SW 曲线。我们的表示方法的优点在于:(i) 这些三维曲线是严格非退化的,并且在将 DNA 序列转换为其数学表示形式时不会丢失信息;(ii) 这些三维曲线上每个节点的坐标都具有明确的生物学含义。我们展示了这三种三维曲线的两个应用:(a) 从曲线上节点的坐标推导出计算四个碱基(A、G、C 和 T)含量的简单公式;(b) 根据三维曲线的几何中心构建了一个 12 分量的特征向量,用于比较来自不同物种的 DNA 序列之间的相似性。作为示例,我们检验了来自 11 个物种的β-球蛋白基因第一外显子的编码序列之间的相似性,并验证了来自 8 个物种的β-球蛋白基因 cDNA 序列的相似性。