Computer Science and Engineering, Narula Institute of Technology, Kolkata, India.
Computer Science and Engineering, Narula Institute of Technology, Kolkata, India.
Gene. 2020 Mar 10;730:144257. doi: 10.1016/j.gene.2019.144257. Epub 2019 Nov 21.
Genetic sequence analysis, classification of genome sequence and evolutionary relationship between species using their biological sequences, are the emerging research domain in Bioinformatics. Several methods have already been applied to DNA sequence comparison under tri-nucleotide representation. In this paper, a new form of tri-nucleotide representation is proposed for sequence comparison. The comparison does not depend on the alignment of the sequences. In this representation, the bio-chemical properties of the nucleotides are considered. The novelty of this method is that the sequences of unequal lengths are represented by vectors of the same length and each of the tri-nucleotide formed out of the given sequence has its unique representation. To validate the proposed method, it is verified on several data sets related to mammalians, viruses and bacteria. The results of this method are further compared with those obtained by methods such as probabilistic method, natural vector method, Fourier power spectrum method, multiple encoding vector method, and feature frequency profiles method. Moreover, this method produces accurate phylogeny in all the cases. It is also proved that the time complexity of the present method is less.
使用生物序列对遗传序列进行分析、对基因组序列进行分类以及对物种间的进化关系进行研究,是生物信息学中一个新兴的研究领域。已经有几种方法被应用于三核苷酸表示下的 DNA 序列比较。在本文中,我们提出了一种新的三核苷酸表示形式用于序列比较。这种比较不依赖于序列的对齐。在这种表示形式中,考虑了核苷酸的生化特性。该方法的新颖之处在于,用相同长度的向量表示不等长的序列,并且给定序列中形成的每个三核苷酸都有其独特的表示。为了验证所提出的方法,我们在与哺乳动物、病毒和细菌相关的几个数据集上进行了验证。将该方法的结果与概率方法、自然向量方法、傅里叶功率谱方法、多重编码向量方法和特征频率分布方法等方法的结果进行了比较。此外,该方法在所有情况下都能生成准确的系统发育树。还证明了该方法的时间复杂度较低。