• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量

Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.

作者信息

Pei Shaojun, Dong Rui, He Rong Lucy, Yau Stephen S-T

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing, PR China.

Department of Biological Sciences, Chicago State University, Chicago, IL 60628, USA.

出版信息

Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.

DOI:10.1016/j.csbj.2019.07.003
PMID:31384399
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6661692/
Abstract

Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods.

摘要

基因组比较是生物信息学的一个重要研究领域。对于大规模的基因组比较,由于其算法复杂性,多序列比对(MSA)方法已不实用。在本研究中,我们基于DNA序列与其累积傅里叶功率和相位谱的完整中心矩向量之间的一一对应关系,提出了一种新颖的无比对方法。此外,还考虑了功率和相位谱中四个核苷酸之间的协方差。我们使用累积傅里叶功率和相位谱为每个DNA序列定义一个28维向量。向量之间的欧几里得距离可以衡量DNA序列之间的差异。我们使用不同大小和类型的数据集进行测试,包括模拟DNA序列、外显子 - 内含子和完整基因组。结果表明,与其他无比对方法和MSA方法相比,我们的方法在进行层次聚类时更准确、高效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/b7662385d2f6/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/8282507afd72/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/731b559f09e6/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/0446d642e897/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/251b731a9185/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/b5031dc2bf15/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/e61b4dc1f64a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/74594020534b/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/ff8a4a56138c/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/7f8280247b93/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/02fb06e78165/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/89f61af3409a/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/5c4812937182/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/30ce04d139a2/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/b7662385d2f6/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/8282507afd72/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/731b559f09e6/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/0446d642e897/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/251b731a9185/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/b5031dc2bf15/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/e61b4dc1f64a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/74594020534b/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/ff8a4a56138c/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/7f8280247b93/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/02fb06e78165/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/89f61af3409a/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/5c4812937182/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/30ce04d139a2/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fb8/6661692/b7662385d2f6/gr13.jpg

相似文献

1
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量
Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.
2
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
3
A new method to cluster genomes based on cumulative Fourier power spectrum.一种基于累积傅里叶功率谱的基因组聚类新方法。
Gene. 2018 Oct 5;673:239-250. doi: 10.1016/j.gene.2018.06.042. Epub 2018 Jun 20.
4
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.
5
A novel clustering method via nucleotide-based Fourier power spectrum analysis.一种基于核苷酸的傅里叶功率谱分析的新型聚类方法。
J Theor Biol. 2011 Jun 21;279(1):83-9. doi: 10.1016/j.jtbi.2011.03.029. Epub 2011 Apr 2.
6
A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.一种使用核苷酸间协方差对基因组序列进行聚类的新方法。
Front Genet. 2019 Apr 9;10:234. doi: 10.3389/fgene.2019.00234. eCollection 2019.
7
A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.一种通过拉马努金-傅里叶变换对DNA序列进行比较分析的新方法。
J Comput Biol. 2014 Dec;21(12):867-79. doi: 10.1089/cmb.2014.0120.
8
A new method to cluster DNA sequences using Fourier power spectrum.一种使用傅里叶功率谱对DNA序列进行聚类的新方法。
J Theor Biol. 2015 May 7;372:135-45. doi: 10.1016/j.jtbi.2015.02.026. Epub 2015 Mar 5.
9
A Novel Real-Time Genome Comparison Method Using Discrete Wavelet Transform.一种使用离散小波变换的新型实时基因组比较方法。
J Comput Biol. 2018 Apr;25(4):405-416. doi: 10.1089/cmb.2017.0115. Epub 2017 Dec 22.
10
Fast and accurate genome comparison using genome images: The Extended Natural Vector Method.使用基因组图像进行快速准确的基因组比较:扩展自然向量方法。
Mol Phylogenet Evol. 2019 Dec;141:106633. doi: 10.1016/j.ympev.2019.106633. Epub 2019 Sep 26.

引用本文的文献

1
Comparative study of encoded and alignment-based methods for virus taxonomy classification.基于编码和比对的病毒分类学方法比较研究。
Sci Rep. 2023 Oct 31;13(1):18662. doi: 10.1038/s41598-023-45461-0.
2
New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning.新的病毒基因组表示方法在深度学习 SARS-CoV-2 分类中的应用。
BMC Bioinformatics. 2023 Mar 11;24(1):92. doi: 10.1186/s12859-023-05188-1.
3
Context dependent prediction in DNA sequence using neural networks.基于神经网络的 DNA 序列上下文相关预测。

本文引用的文献

1
A new method to cluster genomes based on cumulative Fourier power spectrum.一种基于累积傅里叶功率谱的基因组聚类新方法。
Gene. 2018 Oct 5;673:239-250. doi: 10.1016/j.gene.2018.06.042. Epub 2018 Jun 20.
2
Clustal Omega for making accurate alignments of many protein sequences.Clustal Omega用于对多个蛋白质序列进行精确比对。
Protein Sci. 2018 Jan;27(1):135-145. doi: 10.1002/pro.3290. Epub 2017 Oct 30.
3
Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.基于混沌游戏表示的DNA序列数值编码及其在相似性比较中的应用
PeerJ. 2022 Sep 20;10:e13666. doi: 10.7717/peerj.13666. eCollection 2022.
4
FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis.FFP:氨基酸特性感知系统发育分析中的联合快速傅里叶变换和分形维数。
BMC Bioinformatics. 2022 Aug 19;23(1):347. doi: 10.1186/s12859-022-04889-3.
5
Identification of HIV Rapid Mutations Using Differences in Nucleotide Distribution over Time.利用核苷酸随时间分布的差异鉴定 HIV 快速突变。
Genes (Basel). 2022 Jan 19;13(2):170. doi: 10.3390/genes13020170.
6
Full Chromosomal Relationships Between Populations and the Origin of Humans.群体之间的全染色体关系与人类起源
Front Genet. 2022 Feb 2;12:828805. doi: 10.3389/fgene.2021.828805. eCollection 2021.
7
A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector.一种蛋白质的新型数值表示:三维混沌博弈表示及其扩展自然向量。
Comput Struct Biotechnol J. 2020 Jul 15;18:1904-1913. doi: 10.1016/j.csbj.2020.07.004. eCollection 2020.
8
Analysis of the Hosts and Transmission Paths of SARS-CoV-2 in the COVID-19 Outbreak.分析 COVID-19 大流行中 SARS-CoV-2 的宿主和传播途径。
Genes (Basel). 2020 Jun 9;11(6):637. doi: 10.3390/genes11060637.
9
Alignment-free genomic sequence comparison using FCGR and signal processing.基于 FCGR 和信号处理的无比对基因组序列比较。
BMC Bioinformatics. 2019 Dec 30;20(1):742. doi: 10.1186/s12859-019-3330-3.
Genomics. 2016 Oct;108(3-4):134-142. doi: 10.1016/j.ygeno.2016.08.002. Epub 2016 Aug 15.
4
MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets.MEGA7:适用于更大数据集的分子进化遗传学分析版本7.0
Mol Biol Evol. 2016 Jul;33(7):1870-4. doi: 10.1093/molbev/msw054. Epub 2016 Mar 22.
5
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
6
A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition.基于混沌博弈表示法和奇异值分解的人乳头瘤病毒基因型高性能预测
BMC Bioinformatics. 2015 Mar 5;16:71. doi: 10.1186/s12859-015-0493-4.
7
A new method to cluster DNA sequences using Fourier power spectrum.一种使用傅里叶功率谱对DNA序列进行聚类的新方法。
J Theor Biol. 2015 May 7;372:135-45. doi: 10.1016/j.jtbi.2015.02.026. Epub 2015 Mar 5.
8
Global comparison of multiple-segmented viruses in 12-dimensional genome space.12维基因组空间中多节段病毒的全球比较
Mol Phylogenet Evol. 2014 Dec;81:29-36. doi: 10.1016/j.ympev.2014.08.003. Epub 2014 Aug 27.
9
Viral genome phylogeny based on Lempel-Ziv complexity and Hausdorff distance.基于 Lempel-Ziv 复杂度和 Hausdorff 距离的病毒基因组系统发育分析。
J Theor Biol. 2014 May 7;348:12-20. doi: 10.1016/j.jtbi.2014.01.022. Epub 2014 Jan 29.
10
Protein sequence comparison based on K-string dictionary.基于 K-字符串字典的蛋白质序列比较。
Gene. 2013 Oct 25;529(2):250-6. doi: 10.1016/j.gene.2013.07.092. Epub 2013 Aug 9.