Department of Mathematics, School of Science, Anhui Science and Technology University, Fengyang, Anhui 233100, China.
Gene. 2013 Apr 15;518(2):419-24. doi: 10.1016/j.gene.2012.12.079. Epub 2013 Jan 23.
K-mer-based approach has been widely used in similarity analyses so as to discover similarity/dissimilarity among different biological sequences. In this study, we have improved the traditional K-mer method, and introduce a segmented K-mer approach (s-K-mer). After each primary sequence is divided into several segments, we simultaneously transform all these segments into corresponding K-mer-based vectors. In this approach, it is vital how to determine the optimal combination of distance metric with the number of K and the number of segments, i.e., (K(⁎), s(⁎), and d(⁎)). Based on the cascaded feature vectors transformed from s(⁎) segmented sequences, we analyze 34 mammalian genome sequences using the proposed s-K-mer approach. Meanwhile, we compare the results of s-K-mer with those of traditional K-mer. The contrastive analysis results demonstrate that s-K-mer approach outperforms the traditionally K-mer method on similarity analysis among different species.
基于 K -mer 的方法已被广泛应用于相似性分析,以发现不同生物序列之间的相似性/差异性。在本研究中,我们改进了传统的 K-mer 方法,并引入了分段 K-mer 方法(s-K-mer)。在将每个主要序列划分为几个片段后,我们同时将所有这些片段转换为相应的基于 K-mer 的向量。在这种方法中,如何确定距离度量与 K 的数量和片段的数量的最佳组合(K(⁎)、s(⁎)和 d(⁎))至关重要。基于从 s(⁎)分段序列转换的级联特征向量,我们使用提出的 s-K-mer 方法分析了 34 种哺乳动物基因组序列。同时,我们将 s-K-mer 的结果与传统 K-mer 的结果进行了比较。对比分析结果表明,s-K-mer 方法在不同物种之间的相似性分析方面优于传统的 K-mer 方法。