• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于前k个n元语法匹配的新型无比对DNA序列相似性分析方法。

A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up.

作者信息

Delibaş Emre, Arslan Ahmet, Şeker Abdulkadir, Diri Banu

机构信息

Department of Computer Engineering, Faculty of Engineering, Sivas Cumhuriyet University, 58140, Sivas, Turkey.

Department of Computer Engineering, Faculty of Engineering, Selçuk University, 42250, Konya, Turkey.

出版信息

J Mol Graph Model. 2020 Nov;100:107693. doi: 10.1016/j.jmgm.2020.107693. Epub 2020 Aug 7.

DOI:10.1016/j.jmgm.2020.107693
PMID:32805559
Abstract

DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences.

摘要

DNA序列相似性分析是计算生物学和生物信息学中的一项重要任务。在几乎所有探索进化关系、基因功能分析、蛋白质结构预测和序列检索的研究中,都有必要进行相似性计算。作为基于比对的序列比较方法(计算成本高)的替代方法,出现了通过在不同空间对序列进行数字化来计算相似性的无比对方法。在本文中,我们提出了一种基于前k个n元语法匹配的无比对DNA序列相似性分析方法,预测常见的重复DNA子序列表明DNA序列之间具有高度相似性。在我们的方法中,我们通过测量根据前k个n元语法匹配分数创建的特征向量之间的相似性来确定DNA序列相似性,而不使用相似性函数。我们将相似性计算应用于三个不同长度的DNA数据集。我们的方法揭示的系统发育关系表明,我们构建的树几乎与基于序列比对的MEGA软件的结果完全一致。我们的研究结果表明,一定数量的频繁出现的共同序列模式具有表征DNA序列的能力。

相似文献

1
A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up.一种基于前k个n元语法匹配的新型无比对DNA序列相似性分析方法。
J Mol Graph Model. 2020 Nov;100:107693. doi: 10.1016/j.jmgm.2020.107693. Epub 2020 Aug 7.
2
DNA sequence similarity analysis using image texture analysis based on first-order statistics.基于一阶统计量的图像纹理分析用于DNA序列相似性分析。
J Mol Graph Model. 2020 Sep;99:107603. doi: 10.1016/j.jmgm.2020.107603. Epub 2020 May 3.
3
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.
4
A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.一种通过拉马努金-傅里叶变换对DNA序列进行比较分析的新方法。
J Comput Biol. 2014 Dec;21(12):867-79. doi: 10.1089/cmb.2014.0120.
5
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
6
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
7
A new method to cluster genomes based on cumulative Fourier power spectrum.一种基于累积傅里叶功率谱的基因组聚类新方法。
Gene. 2018 Oct 5;673:239-250. doi: 10.1016/j.gene.2018.06.042. Epub 2018 Jun 20.
8
Optimization and Performance Analysis of CAT Method for DNA Sequence Similarity Searching and Alignment.CAT 方法在 DNA 序列相似性搜索和比对中的优化与性能分析。
Genes (Basel). 2024 Mar 7;15(3):341. doi: 10.3390/genes15030341.
9
An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction.一种基于比对的启发式算法,用于快速的序列比对,可应用于系统发育重建。
BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):404. doi: 10.1186/s12859-020-03738-5.
10
A novel hierarchical clustering algorithm for gene sequences.一种新的基因序列层次聚类算法。
BMC Bioinformatics. 2012 Jul 23;13:174. doi: 10.1186/1471-2105-13-174.

引用本文的文献

1
Visualization Methods for DNA Sequences: A Review and Prospects.DNA 序列的可视化方法:综述与展望。
Biomolecules. 2024 Nov 14;14(11):1447. doi: 10.3390/biom14111447.
2
DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences.DNA N元语法分析框架(DNAnamer):一种用于DNA序列监督分类的广义N元语法频率分析框架。
Heliyon. 2024 Aug 24;10(17):e36914. doi: 10.1016/j.heliyon.2024.e36914. eCollection 2024 Sep 15.
3
Application of Feature Definition and Quantification in Biological Sequence Analysis.
特征定义与量化在生物序列分析中的应用。
Curr Genomics. 2023 Oct 27;24(2):64-65. doi: 10.2174/1389202924666230816150732.