Department of Computer Science and Engineering, Narula Institute of Technology, Kolkata, India.
Department of Computer Science, Barasat College, Kolkata, India.
Genomics. 2018 Sep;110(5):263-273. doi: 10.1016/j.ygeno.2017.11.003. Epub 2017 Nov 24.
Several proteins and genes are members of families that share a public evolutionary. In order to outline the evolutionary relationships and to recognize conserved patterns, sequence comparison becomes an emerging process. The current work investigates critically the k-mer role in composition vector method for comparing genome sequences. Generally, composition vector methods using k-mer are applied under choice of different value of k to compare genome sequences. For some values of k, results are satisfactory, but for other values of k, results are unsatisfactory. Standard composition vector method is carried out in the proposed work using 3-mer string length. In addition, special type of information based similarity index is used as a distance measure. It establishes that use of 3-mer and information based similarity index provide satisfactory results especially for comparison of whole genome sequences in all cases. These selections provide a sort of unified approach towards comparison of genome sequences.
几种蛋白质和基因是具有公共进化的家族成员。为了概述进化关系并识别保守模式,序列比较成为一种新兴的过程。目前的工作批判性地研究了 k-mer 在比较基因组序列的组成向量方法中的作用。一般来说,使用 k-mer 的组成向量方法是在选择不同的 k 值下应用于比较基因组序列。对于某些 k 值,结果是令人满意的,但对于其他 k 值,结果是不满意的。在提出的工作中,使用 3-mer 字符串长度进行标准组成向量方法。此外,使用基于信息的特殊类型相似性指数作为距离度量。它确定使用 3-mer 和基于信息的相似性指数提供了令人满意的结果,特别是对于所有情况下的整个基因组序列的比较。这些选择为比较基因组序列提供了一种统一的方法。