Suppr超能文献

一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.

作者信息

Yin Changchuan, Chen Ying, Yau Stephen S-T

机构信息

College of Information Systems and Technology, University of Phoenix, Chicago, IL 60601, USA.

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.

出版信息

J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.

Abstract

Multiple sequence alignment (MSA) is a prominent method for classification of DNA sequences, yet it is hampered with inherent limitations in computational complexity. Alignment-free methods have been developed over past decade for more efficient comparison and classification of DNA sequences than MSA. However, most alignment-free methods may lose structural and functional information of DNA sequences because they are based on feature extractions. Therefore, they may not fully reflect the actual differences among DNA sequences. Alignment-free methods with information conservation are needed for more accurate comparison and classification of DNA sequences. We propose a new alignment-free similarity measure of DNA sequences using the Discrete Fourier Transform (DFT). In this method, we map DNA sequences into four binary indicator sequences and apply DFT to the indicator sequences to transform them into frequency domain. The Euclidean distance of full DFT power spectra of the DNA sequences is used as similarity distance metric. To compare the DFT power spectra of DNA sequences with different lengths, we propose an even scaling method to extend shorter DFT power spectra to equal the longest length of the sequences compared. After the DFT power spectra are evenly scaled, the DNA sequences are compared in the same DFT frequency space dimensionality. We assess the accuracy of the similarity metric in hierarchical clustering using simulated DNA and virus sequences. The results demonstrate that the DFT based method is an effective and accurate measure of DNA sequence similarity.

摘要

多序列比对(MSA)是一种用于DNA序列分类的重要方法,但其在计算复杂度方面存在固有限制。在过去十年中,已经开发出了无比对方法,用于比MSA更高效地比较和分类DNA序列。然而,大多数无比对方法可能会丢失DNA序列的结构和功能信息,因为它们基于特征提取。因此,它们可能无法充分反映DNA序列之间的实际差异。为了更准确地比较和分类DNA序列,需要具有信息守恒的无比对方法。我们提出了一种使用离散傅里叶变换(DFT)的新的DNA序列无比对相似性度量方法。在这种方法中,我们将DNA序列映射到四个二进制指示序列,并将DFT应用于指示序列以将它们变换到频域。DNA序列的完整DFT功率谱的欧几里得距离用作相似性距离度量。为了比较不同长度的DNA序列的DFT功率谱,我们提出了一种均匀缩放方法,将较短的DFT功率谱扩展到与所比较序列的最长长度相等。在DFT功率谱均匀缩放之后,在相同的DFT频率空间维度中比较DNA序列。我们使用模拟的DNA和病毒序列评估层次聚类中相似性度量的准确性。结果表明,基于DFT的方法是一种有效且准确的DNA序列相似性度量方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验