Hoang Tung, Yin Changchuan, Zheng Hui, Yu Chenglong, Lucy He Rong, Yau Stephen S-T
Department of Mathematics, Statistics and Computer Science, University of Ilinois at Chicago, Chicago, IL 60607, USA.
Mind and Brain Theme, South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA 5000, Australia; School of Medicine, Flinders University, Adelaide, SA 5001, Australia.
J Theor Biol. 2015 May 7;372:135-45. doi: 10.1016/j.jtbi.2015.02.026. Epub 2015 Mar 5.
A novel clustering method is proposed to classify genes and genomes. For a given DNA sequence, a binary indicator sequence of each nucleotide is constructed, and Discrete Fourier Transform is applied on these four sequences to attain respective power spectra. Mathematical moments are built from these spectra, and multidimensional vectors of real numbers are constructed from these moments. Cluster analysis is then performed in order to determine the evolutionary relationship between DNA sequences. The novelty of this method is that sequences with different lengths can be compared easily via the use of power spectra and moments. Experimental results on various datasets show that the proposed method provides an efficient tool to classify genes and genomes. It not only gives comparable results but also is remarkably faster than other multiple sequence alignment and alignment-free methods.
提出了一种用于基因和基因组分类的新型聚类方法。对于给定的DNA序列,构建每个核苷酸的二元指示序列,并对这四个序列应用离散傅里叶变换以获得各自的功率谱。从这些谱构建数学矩,并从这些矩构建实数值的多维向量。然后进行聚类分析以确定DNA序列之间的进化关系。该方法的新颖之处在于,通过使用功率谱和矩可以轻松比较不同长度的序列。在各种数据集上的实验结果表明,所提出的方法为基因和基因组分类提供了一种有效的工具。它不仅给出了可比的结果,而且比其他多序列比对和无比对方法明显更快。