Kumar Rajnish, Mishra Bharat Kumar, Lahiri Tapobrata, Kumar Gautam, Kumar Nilesh, Gupta Rahul, Pal Manoj Kumar
Department of Applied Science, Indian Institute of Information Technology - Allahabad, Allahabad, UP, 211012, India.
Interdiscip Sci. 2017 Jun;9(2):173-183. doi: 10.1007/s12539-015-0136-5. Epub 2016 Jan 29.
Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.
通过现有的比对技术在线检索同源核苷酸序列是针对给定序列数据库的常见做法。这些技术的突出特点是依赖局部比对技术和评分矩阵,而其可靠性受到计算复杂性和准确性的限制。朝着这个方向,这项工作提供了一种基因数值表示的新方法,这可以进一步帮助将数据空间划分为更小的分区,有助于形成搜索树。在这种情况下,本文引入了一种36维的周期性计数值(PCV),它代表特定的核苷酸序列,是通过改编Kolekar等人(美国物理研究所1298:307 - 312,2010。doi: 10.1063/1.3516320)的随机模型概念创建的。PCV构建使用了核苷酸的物理化学性质及其在基因内的位置分布模式的信息。据观察,基因的PCV表示在计算一对基因之间的距离时降低了计算成本,同时与现有方法一致。通过将基于PCV的方法与使用现有序列比对方法的方法相比,在分子系统发育构建中的应用进一步测试了其有效性。