College of Science, Northeast Forestry University, Hexing Road 26, Harbin, Heilongjiang Province, 150040, PR China.
College of Science, Northeast Forestry University, Hexing Road 26, Harbin, Heilongjiang Province, 150040, PR China.
J Mol Graph Model. 2021 Sep;107:107942. doi: 10.1016/j.jmgm.2021.107942. Epub 2021 May 23.
As a very important research direction in the field of bioinformatics, sequence alignment plays a vital role in the research and development of biology. Converting genome sequence to graph by using frequency chaos game representation (FCGR) is an excellent gene sequence mapping technology, which can store rich genetic information into FCGR graphics. To each FCGR image, we construct its perceptual image hashing (PIH) matrix using the bicubic interpolation zooming. The difference of the perceptual hash matrix of each two images is calculated, and the clustering distance of the corresponding two gene sequences is represented by the differentials of the perceptual hash matrix. In this paper, we aligned and analyzed several typical genome sequence datasets including mammalian mitochondrial genes, human immunodeficiency virus 1 (HIV-1) and hepatitis E virus (HEV) to build their evolutionary trees. Experimental results showed that our PIH combining FCGR method (FCGR-PIH) has similar classification accuracy to the classical Clustal W sequence alignment method. Furthermore, 25 complete mitochondrial DNA sequences of cichlid fishes and 27 Escherichia coli/Shigella full genome sequences were selected from the AFproject test platform for tests. The performance benchmark rankings demonstrate the effectiveness of the FCGR-PIH algorithm and its potential for large-scale genome sequence analysis.
作为生物信息学领域的一个非常重要的研究方向,序列比对在生物学的研究和开发中起着至关重要的作用。通过使用频率混沌游戏表示(FCGR)将基因组序列转换为图形是一种出色的基因序列映射技术,可以将丰富的遗传信息存储到 FCGR 图形中。对于每个 FCGR 图像,我们使用双三次插值缩放来构建其感知图像哈希(PIH)矩阵。计算两个图像的感知哈希矩阵之间的差异,并通过感知哈希矩阵的差异来表示相应两个基因序列的聚类距离。在本文中,我们对齐和分析了包括哺乳动物线粒体基因、人类免疫缺陷病毒 1(HIV-1)和戊型肝炎病毒(HEV)在内的几个典型基因组序列数据集,以构建它们的进化树。实验结果表明,我们的结合 FCGR 的 PIH 方法(FCGR-PIH)与经典的 Clustal W 序列比对方法具有相似的分类准确性。此外,从 AFproject 测试平台中选择了 25 个完整的拟南芥线粒体 DNA 序列和 27 个大肠杆菌/志贺氏菌全基因组序列进行测试。性能基准排名证明了 FCGR-PIH 算法的有效性及其在大规模基因组序列分析中的潜力。