School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China; School of Mathematics, Liaoning Normal University, Dalian, Liaoning 116029, PR China.
School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, PR China.
J Theor Biol. 2013 Nov 21;337:61-70. doi: 10.1016/j.jtbi.2013.07.028. Epub 2013 Aug 8.
Originating from sequences' length difference, both k-word based methods and graphical representation approaches have uncovered biological information in their distinct ways. However, it is less likely that the mechanisms of information storage vary with sequences' length. A similarity distance suitable for sequences with various lengths will be much near to the mechanisms of information storage. In this paper, new sub-sequences of k-word were extracted from biological sequences under a one-to-one mapping. The new sub-sequences were evaluated by a linear regression model. Moreover, a new distance was defined on the invariants from the linear regression model. With comparison to other alignment-free distances, the results of four experiments demonstrated that our similarity distance was more efficient.
源于序列长度差异,基于 k 字的方法和图形表示方法以不同的方式揭示了生物学信息。然而,信息存储的机制不太可能随序列长度而变化。适合各种长度序列的相似距离将更接近信息存储的机制。在本文中,从生物序列中以一对一映射的方式提取了新的 k 字子序列。通过线性回归模型评估新的子序列。此外,在线性回归模型的不变量上定义了新的距离。与其他无比对距离相比,四个实验的结果表明,我们的相似距离更为有效。