• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于累积傅里叶功率谱的基因组聚类新方法。

A new method to cluster genomes based on cumulative Fourier power spectrum.

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.

Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, IL 60607, USA.

出版信息

Gene. 2018 Oct 5;673:239-250. doi: 10.1016/j.gene.2018.06.042. Epub 2018 Jun 20.

DOI:10.1016/j.gene.2018.06.042
PMID:29935353
Abstract

Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum).

摘要

使用数学方法分析系统发育关系在生物信息学中一直很重要。定量研究可以以精确的方式解释原始生物数据。多序列比对(MSA)常用于分析生物进化,但非常耗时。当数据规模较大时,对齐方法无法在合理的时间内完成计算。因此,我们提出了一种使用累积傅里叶功率谱矩的新方法来对 DNA 序列进行聚类。每个序列都被转换为欧几里得空间中的向量。向量之间的距离可以反映序列之间的关系。谱与矩向量之间的映射是一一对应的,这意味着在计算过程中不会丢失功率谱中的任何信息。我们对包括流感 A、灵长类动物和人鼻病毒(HRV)数据集在内的几个数据集进行聚类和分类,以构建系统发育树。结果表明,新提出的累积傅里叶功率谱比 MSA 和另一种称为 k-mer 的无对齐方法快得多,也准确得多。该研究为系统发育、进化以及大型基因组的高效 DNA 比较算法的研究提供了新的思路。累积傅里叶功率谱的计算机程序可在 GitHub(https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum)上获得。

相似文献

1
A new method to cluster genomes based on cumulative Fourier power spectrum.一种基于累积傅里叶功率谱的基因组聚类新方法。
Gene. 2018 Oct 5;673:239-250. doi: 10.1016/j.gene.2018.06.042. Epub 2018 Jun 20.
2
A new method to cluster DNA sequences using Fourier power spectrum.一种使用傅里叶功率谱对DNA序列进行聚类的新方法。
J Theor Biol. 2015 May 7;372:135-45. doi: 10.1016/j.jtbi.2015.02.026. Epub 2015 Mar 5.
3
A novel clustering method via nucleotide-based Fourier power spectrum analysis.一种基于核苷酸的傅里叶功率谱分析的新型聚类方法。
J Theor Biol. 2011 Jun 21;279(1):83-9. doi: 10.1016/j.jtbi.2011.03.029. Epub 2011 Apr 2.
4
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
5
A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.一种通过拉马努金-傅里叶变换对DNA序列进行比较分析的新方法。
J Comput Biol. 2014 Dec;21(12):867-79. doi: 10.1089/cmb.2014.0120.
6
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.
7
A Novel Real-Time Genome Comparison Method Using Discrete Wavelet Transform.一种使用离散小波变换的新型实时基因组比较方法。
J Comput Biol. 2018 Apr;25(4):405-416. doi: 10.1089/cmb.2017.0115. Epub 2017 Dec 22.
8
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量
Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.
9
Alignment method for spectrograms of DNA sequences.DNA序列频谱图的比对方法。
IEEE Trans Inf Technol Biomed. 2010 Jan;14(1):3-9. doi: 10.1109/TITB.2009.2033052. Epub 2009 Sep 29.
10
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

1
Context dependent prediction in DNA sequence using neural networks.基于神经网络的 DNA 序列上下文相关预测。
PeerJ. 2022 Sep 20;10:e13666. doi: 10.7717/peerj.13666. eCollection 2022.
2
Identification of HIV Rapid Mutations Using Differences in Nucleotide Distribution over Time.利用核苷酸随时间分布的差异鉴定 HIV 快速突变。
Genes (Basel). 2022 Jan 19;13(2):170. doi: 10.3390/genes13020170.
3
Full Chromosomal Relationships Between Populations and the Origin of Humans.群体之间的全染色体关系与人类起源
Front Genet. 2022 Feb 2;12:828805. doi: 10.3389/fgene.2021.828805. eCollection 2021.
4
Analysis of the Hosts and Transmission Paths of SARS-CoV-2 in the COVID-19 Outbreak.分析 COVID-19 大流行中 SARS-CoV-2 的宿主和传播途径。
Genes (Basel). 2020 Jun 9;11(6):637. doi: 10.3390/genes11060637.
5
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量
Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.
6
SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches.SWSPM:一种基于信号处理方法的新型无比对DNA比较方法。
Evol Bioinform Online. 2019 May 30;15:1176934319849071. doi: 10.1177/1176934319849071. eCollection 2019.
7
A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.一种使用核苷酸间协方差对基因组序列进行聚类的新方法。
Front Genet. 2019 Apr 9;10:234. doi: 10.3389/fgene.2019.00234. eCollection 2019.