• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于无比对DNA序列相似性分析的高效词频逆文档频率方法。

Efficient TF-IDF method for alignment-free DNA sequence similarity analysis.

作者信息

Delibaş Emre

机构信息

Department of Computer Engineering, Faculty of Engineering, Sivas Cumhuriyet University, 58140, Sivas, Turkey.

出版信息

J Mol Graph Model. 2025 Jun;137:109011. doi: 10.1016/j.jmgm.2025.109011. Epub 2025 Mar 15.

DOI:10.1016/j.jmgm.2025.109011
PMID:40107030
Abstract

This study proposes a pioneering alignment-free approach for the analysis of DNA sequence similarity. The method employs the representation of DNA sequences as n-grams, a technique that involves the adaptation of the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to genomic data. The primary objective of this approach is to enhance the accuracy of the results while concomitantly reducing the computational costs of the process, by ascertaining the most informative n-grams. The approach adopted in this study successfully circumvents the limitations of both traditional alignment-based and alignment-free methods, thereby demonstrating a commendable level of performance. The proposed method was tested on three different datasets and achieved high agreement with reference phylogenetic trees in the AFProject benchmark system. The results demonstrate that TF-IDF-based similarity matrices effectively capture phylogenetic relationships and significantly reduce processing time. The high accuracy rates obtained prove that the method offers a scalable and robust alternative in large genomic datasets. The method demonstrates considerable potential in DNA sequence similarity analysis, exhibiting high accuracy and low computational cost.

摘要

本研究提出了一种用于分析DNA序列相似性的开创性无比对方法。该方法将DNA序列表示为n元语法,这是一种将词频逆文档频率(TF-IDF)算法应用于基因组数据的技术。这种方法的主要目标是通过确定最具信息性的n元语法,提高结果的准确性,同时降低该过程的计算成本。本研究采用的方法成功地规避了传统比对方法和无比对方法的局限性,从而展现出了值得称赞的性能水平。所提出的方法在三个不同的数据集上进行了测试,并在AFProject基准系统中与参考系统发育树达成了高度一致。结果表明,基于TF-IDF的相似性矩阵有效地捕捉了系统发育关系,并显著减少了处理时间。所获得的高准确率证明了该方法在大型基因组数据集中提供了一种可扩展且稳健的替代方案。该方法在DNA序列相似性分析中显示出相当大的潜力,具有高精度和低计算成本。

相似文献

1
Efficient TF-IDF method for alignment-free DNA sequence similarity analysis.用于无比对DNA序列相似性分析的高效词频逆文档频率方法。
J Mol Graph Model. 2025 Jun;137:109011. doi: 10.1016/j.jmgm.2025.109011. Epub 2025 Mar 15.
2
TreeWave: command line tool for alignment-free phylogeny reconstruction based on graphical representation of DNA sequences and genomic signal processing.TreeWave:基于 DNA 序列图形表示和基因组信号处理的无比对系统发育重建命令行工具。
BMC Bioinformatics. 2024 Nov 27;25(1):367. doi: 10.1186/s12859-024-05992-3.
3
A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up.一种基于前k个n元语法匹配的新型无比对DNA序列相似性分析方法。
J Mol Graph Model. 2020 Nov;100:107693. doi: 10.1016/j.jmgm.2020.107693. Epub 2020 Aug 7.
4
A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF.一种基于词频逆文档频率(TF-IDF)的用于检测横向基因转移的新型无比对方法。
Sci Rep. 2016 Jul 25;6:30308. doi: 10.1038/srep30308.
5
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity.基于模糊积分相似度的无比对 DNA 序列聚类方法。
Sci Rep. 2019 Mar 6;9(1):3753. doi: 10.1038/s41598-019-40452-6.
6
PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.PCV:一种用于寻找同源核苷酸序列的无比对方法及其在系统发育研究中的应用。
Interdiscip Sci. 2017 Jun;9(2):173-183. doi: 10.1007/s12539-015-0136-5. Epub 2016 Jan 29.
7
CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences.CGRclust:用于未标记DNA序列双对比聚类的混沌游戏表示法
BMC Genomics. 2024 Dec 18;25(1):1214. doi: 10.1186/s12864-024-11135-y.
8
An improved alignment-free model for DNA sequence similarity metric.一种用于DNA序列相似性度量的改进的无比对模型。
BMC Bioinformatics. 2014 Sep 28;15(1):321. doi: 10.1186/1471-2105-15-321.
9
A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.一种通过拉马努金-傅里叶变换对DNA序列进行比较分析的新方法。
J Comput Biol. 2014 Dec;21(12):867-79. doi: 10.1089/cmb.2014.0120.
10
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.